As far back as the 1970s and 1980s, so-called Expert Systems were among the first forms of artificial intelligence (AI) software holding the promise of today’s AI technology that would encapsulate human expert knowledge and make it available to the rest of us.

Today, we have advanced to Large Language Models (LLMs) such as ChatGPT, Claude, and Gemini, which promise to give us the answers to all our questions. There is agentic AI that can—with limited human input—reason, complete tasks, and solve logistical and other problems. But what does the AI know? And most importantly, where does this knowledge come from? Is the knowledge that the AI absorbs trustworthy and legitimate? AI doesn’t just point us to where we can obtain information, like a search engine. AI creates content from the knowledge it accrues and that knowledge, given its human training, can be biased. And when that knowledge is transformed into pleadings, briefs, motions, arguments and testimony, who sets the guardrails?

In a recent NYLJ article, “AI in the Courtroom: Judicial Scrutiny and Evidentiary Tripwires” — we noted that judges have criticized, cautioned against, and in some cases sanctioned the use of AI—especially when it “translates into hallucinated cases or citations that can mislead, misinform, and/or ultimately waste judicial time and resources.” But there is no doubt that utilization of AI in the courtroom by jurists and litigants alike appears to be to be on the upswing—not only in novel applications but in greater acceptance by the judiciary of AI- generated text and argument.  

An AI Voice From the Grave

Christopher Pelkey died from a gunshot wound during a road-rage incident in 2021. But that did not stop him—or at least his likeness—from making an impact statement that played during the sentencing of his killer in an Arizona courtroom in 2025. Pelkey’s “appearance” (or perhaps apparition) is believed to be the first time AI has been used in such a manner.

As reported by Phoenix’s ABC 15, Pelkey’s sister used AI to create a video replica of her late brother using a photo that Pelkey had made while he was alive. She also wrote a script and fed images and audio files into AI tools to create the clone.

After viewing the video, the judge presiding over the case sentenced the defendant to 10-and-a-half years for manslaughter, one year more than prosecutors had asked for. Did the AI-generated image of the deceased sway the sentencing judge than if the relatives of Mr. Pelkey had petitioned on his behalf?

Interestingly, a June 2024 study found that personalized AI debaters were more persuasive than human debaters in 64.4 percent of debates. The study measured persuasiveness by how much an opponent’s opinion or viewpoint shifted by the end of the debate. As a general proposition regarding AI use, the study noted that humans consider machines a more objective and confident source of information. According to the study, humans assume that humans have an agenda, but that a machine is simply presenting “the facts” in an unbiased, neutral, and dispassionate manner. Further, if the AI’s text output sounds assertive, humans are more likely to be persuaded. Therefore, if we believe AI is unbiased and confident, we are more likely to accept its opinion as fact—even if it is a black box masquerading as a human, as in Pelkey’s case.

AI Use by Judges: “Black Box” Legal Reasoning

AI tools, particularly large language models (LLMs), are often described as a black box: the internal decision-making process is not transparent, and it can be difficult to understand how the AI tool arrived at its output. As judges turn to AI to assist them in making their decisions, this black box problem is a key concern. When judges issue legal decisions, they do not just announce an outcome; rather, they write an often lengthy decision, filled with legal precedent and walking through each step of their reasoning and decision making process. How can appellate courts review decisions that were influenced by the output of an AI tool that cannot explain its reasoning? Should judges be able to rely on AI tools whereby the slight change in wording of a query can be outcome determinative?

It has not escaped notice that federal appellate judges have acknowledged and openly discussed their use of AI to shape their legal reasoning and opinions. In a concurring opinion in Snell v. United Specialty Insurance Co., Judge Kevin Newsom of the U.S. Court of Appeals for the Eleventh Circuit discussed how AI played a role in helping him decide whether an in-ground trampoline was “landscaping” under an insurance policy. While Judge Newsom addressed the valid concerns surrounding the use of AI, he concluded that “LLMs have promise. At the very least, it no longer strikes me as ridiculous to think that an LLM like ChatGPT might have something useful to say about the common, everyday meaning of the words and phrases used in legal texts.”

In Ross v. United States, Judge John P. Howard III of the District of Columbia Court of Appeals discussed the court’s use of AI to determine whether leaving a dog in a hot car is animal cruelty. Judge Howard wrote that “AI tools are proliferating and we ignore them at our own peril. Not only for the concerning capabilities they now give parties with ill intent, but for the great utility such tools could potentially provide in easing the strain on our increasingly overburdened courts.” However, Judge Howard warned that courts need to approach AI “cautiously,” and “a judicial officer or staff member should understand, among many other things, what data the AI tool collects and what the tool does with their data.”

Judge Howard was not wrong on the need for caution. In May, a county court judge in Florida reportedly found herself on the receiving end of ethics charges after allegedly providing a fabricated recording, “likely using computer manipulation or generative AI” to a newspaper editorial board—as support for the assertion, during the judge’s 2024 campaign, that there existed an image crisis in the state judiciary.

While the judge did not create the recording, she allegedly failed to exercise due diligence in forwarding it—owing that it originated with a former, terminated court employee who self-published an e-book denigrating the Ninth Judicial Circuit and the judges in it. The recording reportedly came from a web site linked to the former employee and contained the fabricated voices of judges, including the chief judge of Florida’s Ninth Circuit and those on the Florida Supreme Court, engaged in a derogatory conversation.

The county judge first allegedly cited the e-book to the editorial board—“despite not having read any of it”—and followed up by providing a link to the fabricated recording.

“[Y]our actions eroded public confidence by perpetuating a false perception of illegal, unethical, or immoral conduct by Justices of the Florida Supreme Court, a Chief Judge, and others working within the judicial branch,” the Notice of Formal Charges reads.

AI Use by Expert Witnesses: Whether Experts Meet Reliability Standards

In Ferlito v. Harbor Freight Tools USA, Inc., decided in April, the U.S. District Court for the Eastern District of New York was faced with the question of whether to exclude an expert witness, in part because he used ChatGPT in forming his expert opinion. The plaintiff had brought suit alleging that the head of his maul (an axe for splitting wood) detached due to a design defect and sought to offer expert testimony from a consultant—who had experience working in engineering departments and with power tools, but who held no engineering degrees.

The consultant opined that the maul was defectively designed because the head was not securely affixed to the handle. Defendant moved to preclude the consultant’s testimony, arguing he was unqualified as an expert in part because, after completing his report, he asked ChatGPT about the best way to secure a hammer head to a handle. His question generated a response consistent with his report.

The Ferlito court looked to Daubert v. Merrell Dow Pharms., Inc., where the Supreme Court provided a list of factors trial judges should use to evaluate the reliability of expert testimony: (1) whether a theory or technique can be (and has been) tested; (2) whether the theory or technique has been subjected to peer review and publication; (3) a technique's “known or potential rate of error” and “the existence and maintenance of standards controlling the technique’s operation”; and (4) whether a particular technique or theory has gained “general acceptance” in the relevant scientific community.

The Ferlito court found the consultant’s use of ChatGPT likely did not diminish his judgment regarding proper methods for securing the head and handle. The record indicated he used ChatGPT only to confirm his findings after he had authored his report. The court noted there was no indication he used AI to generate a report with false authorities or that his use of AI rendered his testimony less reliable. As a result, defendant’s motion to exclude the consultant as an expert witness was denied.

Incorrect Citations by AI in Anthropic

In the recent case of Concord Music Group, Inc. v. Anthropic PBC, an LLM hallucinated a source – and attorneys did not catch it.

In 2023, Concord Music Group, Inc., along with several other publishers filed a lawsuit against AI developer Anthropic PBC in the U.S. District Court for the Middle District of Tennessee. The publishers asserted copyright claims against Anthropic relating to its Claude LLM chatbot. The case was later moved to the Northern District of California.

During the course of the litigation an Anthropic data scientist was accused of citing a made-up academic report to support her argument after hallucinated sources appeared in the data scientist’s April 30, 2025, declaration.  But the data scientist was not responsible: Anthropic’s attorneys were. An associate asked the chatbot Claude to provide a properly formatted legal citation for the article, which resulted in the hallucinated citation. The citation included the correct link, volume, page numbers, and publication year, but Claude provided a false author and title.

U.S. Magistrate Judge Susan van Keulen said it was “a very serious and grave issue” and that there was “a world of a difference between a missed citation and hallucination generated by AI,” Reuters reported.

AI: The Persuasive Litigator

As noted in the above June 2024 study, AI-enhanced debaters can carry persuasive and winning arguments. Indeed, AI has been shown to be more persuasive than human debaters in 64.4 percent of arguments, with persuasiveness being measured by how much an opponent’s opinion or viewpoint shifted by the end of the debate.

Moreover, when GPT-4 had access to personal information about the debate participants, it created an 81.2 percent increase in the AI’s odds of persuading opponents, compared to human debaters. Human debaters are generally more rooted in their viewpoints, but personalized LLM debaters were able to penetrate typical resistance to persuasion. 

 The use of AI for increasing the odds of a favorable outcome raises significant, if not prohibitive, ethical concerns. Nonetheless, AI does have other competitive advantages over human attorneys in courtroom advocacy. For example, agentic AI can search the entire web in real time, and read and analyze significant amounts of case law, statutes, and evidence in a short time. While a human attorney can come well-prepared with thorough legal research, AI can quickly find and present new precedents, statutes, or evidence in support of motions and arguments throughout trial proceedings, while dynamically adapting to the judge and opposing counsel. Further, if the AI's output sounds authoritative and well-reasoned, juries, at least, are more likely to be persuaded by it.

Quo Vadis?

On May 6, 2025, a U.K. law firm became the first AI-driven law firm to receive approval from a solicitor’s regulation authority—which has been seen as both a “big deal,” and “not a big deal,” owing to human oversight. How successful the initiative is remains to be seen. Yet ultimately, all AI systems take in patterns (largely gleaned from the Internet), generate mathematical representations of these patterns, and provide a way to retrieve, transform, and act upon these patterns.

Different types of AI systems differ in how they can take examples and symbolic knowledge, represent the patterns internally, retrieve and manipulate them, and present and justify outputs to the user. Hallucinations can be seen as a side effect of the lack of symbolic knowledge in most current generative AI systems. In the examples of texts that the AI models were trained on, text that look like citations are correlated with language that is looking to prove a point. Without symbolic knowledge about what citations mean in a legal context (or, for that matter, what proving a point means), it would be natural for a system to conclude that when asked to “prove a point,” it needs to include things that look like citations, without necessarily understanding that the citation needs to match up with an appropriate real case.

Of course, one solution is to give the AI system a symbolic understanding of what it means to prove a point, what citations entail, and the mechanisms by which they can go out and find these citations. This symbolic knowledge can also be referenced by the AI system to provide true justifications or chains of thought, and to allow the AI system to understand when it doesn’t have the knowledge to answer the query.

AI systems that make use of, and reason about, symbolic knowledge are not new, indeed they were studied extensively as far back as the 1950s. We are starting to see hybrid and agentic systems being developed that combine LLMs with symbolic knowledge. While promising, this poses two challenges for the end-user. The first is telling the difference between systems with such capabilities and those without, particularly in the current climate of AI marketing. The second is understanding the scope of these capabilities. Systems that are transparent about what they know and how they know it are just as essential as expert witnesses being clear about the extent of their knowledge and the content upon which the knowledge is based.

Like any tool, LLM-based systems have their role. Responsible use requires an understanding of what that role is. Trained on appropriate and ethically sourced data, LLMs provide access to an incredible corpus of knowledge. They are a fantastic way of exploring an idea, concept, or domain and developing new combinations of existing information. They work well when it might be difficult or time-consuming for a human to come up with text as long as they can thoroughly fact-check the output. Lawyers and courts certainly acknowledge this fact as evidenced by the increasing use and acceptance of AI in the law firm, corporate legal departments and the courtroom. The rub is to know what AI knows and what it doesn’t! 

* * * *

Frances M. Green is Counsel at Epstein Becker Green and a working group member of the AI Safety Institute Consortium of the National Institute of Standards and Technology (NIST). Dr. Raymond Sheh is an Associate Research Scientist at Johns Hopkins University and a Guest Researcher at NIST. Katherine Heaney is an associate at Epstein Becker Green. Ann W. Parks, an attorney with the firm, contributed to the preparation of this article.

Opinions are the authors’ own and not necessarily those of their employers. 

Reprinted with permission from the June 8, 2025, edition of the New York Law Journal © 2025 ALM Global Properties, LLC. All rights reserved. Further duplication without permission is prohibited, contact 877-256-2472 or asset-and-logo-licensing@alm.com.

Jump to Page

Privacy Preference Center

When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.

Strictly Necessary Cookies

These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information.

Performance Cookies

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance.