As artificial intelligence (AI) stands poised to enter our courtrooms, a critical question emerges: If trial courts start using generative AI in their decision-making process, do courts of appeal need to adopt a new standard of review? This article explores the compelling reasons why we might need to reconsider our current appellate review standards, with a particular focus on the potential merits of expanding de novo review to all aspects of AI-assisted judicial decisions, including areas traditionally given more deference.
At the heart of this issue lies the infamous “black box” problem of AI. Unlike human judges who can articulate their reasoning, AI systems often operate in ways that are opaque even to their creators. This lack of transparency presents a fundamental challenge to our legal system. When human judges make decisions, they can explain their thought process, cite laws and relevant cases, and articulate how they weighed different factors. An appellate court can review this reasoning, assess its logic, and determine whether it aligns with established legal principles. But when an AI system influences or generates a decision, we lose this crucial transparency.
Compounding the black box problem is the issue of training data. AI systems learn from vast datasets, which can inadvertently incorporate historical biases. However, once the AI is deployed, this training data is not typically available for scrutiny. This presents a significant problem for appellate review. How can an appellate court assess whether a decision was influenced by biased training data if they can’t examine that data? The potential for hidden biases to influence judicial decisions, without any means of detection through traditional appellate review, is deeply concerning.
While decisions on the law are already subject to de novo review, factual determinations and rulings on objections and evidentiary issues are typically reviewed under more deferential standards like manifest error and abuse of discretion. However, when AI is involved in these determinations, a strong argument emerges for extending de novo review to these areas as well. The traditional rationales for deference—such as the trial court's ability to assess witness credibility or manage the flow of a trial—may no longer fully apply when AI systems are involved in these processes. Moreover, we must ask ourselves: why would we ever give deference to a bot? The very notion of deferring to an AI system's judgment on factual or evidentiary matters seems to run counter to the human-centric nature of our justice system.
This question becomes particularly pertinent when considering the various roles AI can play. For instance, if AI is used to analyze evidence or suggest a case outcome, should appellate courts defer to these AI-influenced decisions? If an AI system plays a role in calculating or recommending damages, should the traditional deference given to trial courts' damages determinations still apply?
We can’t peer into the AI’s “mind” to understand how it weighed various factors or why it recommended certain decisions or awards. In light of this, de novo review of these aspects would ensure that both the factual basis and the evidentiary framework of a case are thoroughly reexamined by human judges on appeal. This approach could help mitigate any hidden biases or errors in the AI's processing, ensuring that the unique aspects of each case are fully considered and that the outcome reflects nuanced human judgment rather than purely algorithmic patterns.
However, implementing such a broad application of de novo review is not without challenges. It would likely increase the workload for appellate courts and could potentially slow down the judicial process. These practical concerns must be carefully weighed against the importance of ensuring fair, unbiased, and transparent judicial decisions in an era of AI-assisted adjudication.
As AI systems evolve and potentially become more explainable, our approach to reviewing their decisions may need to adapt. Any new standard of review should be flexible enough to accommodate technological advancements while maintaining the core principle of thorough, independent review.
Ultimately, whatever approach we adopt must ensure that the benefits of AI in the courtroom are realized without compromising the integrity, fairness, and transparency of our legal system. As we move forward, we must remain committed to a justice system that is not only efficient and accurate but also one whose workings can be clearly understood and scrutinized by all. The question of how to review AI-assisted judicial decisions is not just a legal or technological issue—it’s a fundamental question about how we ensure justice in the digital age.
For more posts like this one, visit JudgeSchlegel.com.
Thank you for your thought-provoking post. But I would say not necessarily. It is true that we “can’t peer into an AI’s mind”; but we also can’t peer into a judge’s mind. The only way we know what a judge based a ruling or decision on is from what that judge says in open court or a written decision. With the appropriate prompting a LLM will give its reasons, just like a judge, which an appellate court could evaluate the same as a ruling by a human judge. It is also true that we don’t know what biases a LLM may have picked up from its training; but we don’t know what biases a human judge may have picked up throughout their life. An appellate court must take a human judge’s possible biases into account, but they only have the ruling in front of them, not the entire history of the judge’s life. Similarly, if a LLM, having written a ruling and explained its reasoning, and the factors that may have been influential or persuasive, the appellate court has the same sort of record it would have with a human judge (depending on the prompts given the “trial LLM” the record might even be more fulsome than some human judges’ decisions). In fact, the LLM could be asked whether it took any biases into account when rendering its decision — which I presume most human judges would consider improper. “AI judges” are not for everyone, and the use of LLMs in jurisprudence could develop in different ways. I believe one of those ways would be treating it sort of like a federal magistrate judge, who with the parties’ consent can enter a final order. The parties would trade off the human judge for a quicker process; perhaps they’d be required to waive the right to appellate review.