In October 2024, the Surrogate’s Court in Saratoga County, New York, was clearly so concerned about the haphazard use of AI by an expert witness that the judge felt compelled to address it, though the expert was deemed unreliable on other grounds.
In Matter of Weber, the expert relied on Microsoft Copilot, a large language model generative AI chatbot, on a matter regarding a trustee’s accounting—yet the expert could not later say what input or prompt he used; could not state what sources the chatbot used; could not explain how Copilot worked; and gave no additional testimony regarding specific calculations.
This prompted Judge Jonathan G. Schopf, or his staff, to grill the chatbot: “Are you accurate?” “Are you reliable?” “Are your calculations reliable enough for use in court?” Ultimately, Copilot responded: “When it comes to legal matters, any calculations or data need to meet strict standards. I can provide accurate info, but it should always be verified by experts and accompanied by professional evaluations before being used in court…”
Obviously, many—including developers and investors—are excited by the potential of artificial intelligence (AI) in the legal world. A March 2025 study from the universities of Minnesota and Michigan law schools shows that AI tools significantly enhanced legal work quality and productivity.
Yet hallucinations remain possible, and the study came on the heels of news that Minnesota expelled a health economics Ph.D. student for alleged AI use on an exam. Others are obviously deeply conflicted about whether an AI tool will determine eligibility for jobs, health care, and insurance.
What about judges? While not adverse to the use of AI as a research tool, judges have been stern when AI use translates into hallucinated cases or citations that can mislead, misinform, and/or ultimately waste judicial time and resources.
The Illinois Supreme Court, for one, has released a policy on AI in the courts, following the approval of a report submitted by a Judicial Conference Task Force that underscores diligence when utilizing AI.
Noting that judges—as well as attorneys and pro se litigants—are ultimately responsible for their work product, the policy specifically cautions that “unsubstantiated or deliberately misleading AI-generated content that perpetuates bias, prejudices litigants, or obscures truth-telling and decision-making will not be tolerated.”
Notably, Illinois’s chief justice noted upon on release of the policy that current rules on attorney professional and judicial conduct were sufficient to govern AI—and that the court would “regularly assess those rules.” Yet fitting AI use into existing (i.e., last-century and/or outdated) litigation frameworks may present challenges.
When a doctor’s notes, taken on a pad of paper with a pen, were traditionally introduced into evidence, it was generally safe to assume they were authentic. Now, if a doctor uses an AI-empowered “ambient listening” tool, how does that change the game?
When a retouched photo was traditionally introduced into evidence, it was not difficult to detect alterations; yet an image created with natural language prompting changes the playing field. How can veracity be assessed?
Where once a video could not have been doctored without a film crew, “deepfake” videos—completely false or altered through the use of AI—can now be created by an unsophisticated actor within minutes (or less!)
Digitally enhanced evidence—amping up audio, video, or images through the use of AI—has caught the attention of at least one court that refused to admit it, after a defense expert witness in a criminal trial enhanced videos with AI, removing some information and adding other information to enhance the image.
Clearly, new AI frameworks are needed for the admission of evidence, rules of professional conduct, and more.
Hallucinated AI Cases
As reported in February, an attorney cited to cases that U.S. District Judge Mark J. Dinsmore of the Southern District of Indiana—hearing Midcentral Operating Engineers Health and Welfare Fund v. HoosierVac LLC—was unable to locate.
This resulted in an order for the attorney in question to appear and show cause why he should not be sanctioned for violating Federal Rule of Civil Procedure 11(c)(3). Rule 11 provides, among other things, that by presenting to the court a pleading, written motion, or other paper, an attorney certifies that the claims, defenses, and other legal contentions are warranted by existing law; and that the factual contentions have or will likely have evidentiary support.
It’s not the first time, nor will it be the last. In the 2023 case of Mata v. Avianca, Inc., plaintiff’s counsel claimed he was unable to access federal court cases using the legal research program utilized by his firm, so he turned to ChatGPT.
Even though he admitted he was unable to locate them, he stated that he thought the cases were unpublished or were in databases to which he did not have access. After being called out by Defendant for citing cases that could not be located, plaintiff’s lawyers made a second filing attaching the purported cases, but these too were obtained from ChatGPT.
Under Rule 11 (c)(3), a court may order an attorney, law firm, or party to show cause why conduct specifically described in the order has not violated Rule 11(b)—an order that attorneys in both cases found themselves on the receiving end of after relevant citations could not be found.
“Filing a brief with a non-existent citation falls far short of an attorney’s duty to the court, his client, and opposing counsel,” Judge Dinsmore wrote.
Too-Generative AI?
The attorney in Midcentral Operating Engineers Health and Welfare Fund admitted to relying on programs utilizing generative AI to draft the briefs—and “did not know that AI was capable of generating fictitious cases and citations.”
And as the hallucinated citations contained text excerpts which appeared to be credible, the attorney did not verify the citations. The attorney agreed that he did not fully comply with Rule 11 but did not act in bad faith or proceed with malice.
“Courts have consistently held that failing to check the treatment and soundness—let alone the existence—of a case warrants sanctions,” Judge Dinsmore said.
Following this statement, the judge cited four cases ranging from 1984 to 1998—when the key offense involved, for example, relying on another attorney’s memorandum without Shepardizing the cases. But the judge, in his order and recommendation, tried to bring things up to date:
It is one thing to use AI to assist with initial research, and even non-legal AI programs may provide a helpful 30,000-foot view. It is an entirely different thing, however, to rely on the output of a generative AI program without verifying the current treatment or validity—or indeed, the very existence—of the case presented.
Confirming a case is good law is a basic, routine matter and something to be expected from a practicing attorney. As noted in the case of an expert witness, an individual’s citation to fake, AI-generated sources…shatters his credibility.”
Not-so-Expert Testimony
Judge Dinsmore might have been referring to an unrelated AI case from six weeks earlier, Kohls et al. v. Ellison et al., in the U.S. District Court for the District of Minnesota. In Kohls, U.S. District Judge Laura M. Provinzino granted plaintiffs’ motion to exclude one expert declaration offered by the attorney general of Minnesota and denied the latter’s motion for leave to file an amended declaration.
Under a fact pattern you couldn’t make up, the Minnesota AG offered a declaration from a professor of communication regarding artificial intelligence (“AI”), deepfakes, and the dangers of deepfakes to free speech and democracy.
The AG’s office subsequently discovered that the declaration contained citations to two nonexistent academic articles and incorrectly cited the authors of a third article, owing to the professor’s use of a generative AI tool. Judge Provinzino did not hold back:
[E]ven if the errors were an innocent mistake, and even if the propositions are substantively accurate, the fact remains that [the professor] submitted a declaration made under penalty of perjury with fake citations.
It is particularly troubling to the court that [he] typically validates citations with a reference software when he writes academic articles but did not do so when submitting the [declaration] as part of Minnesota’s legal filing.
One would expect that greater attention would be paid to a document submitted under penalty of perjury than academic articles. Indeed, the court would expect greater diligence from attorneys, let alone an expert in AI information at one of the country’s most renowned academic institutions.
Deeming the expert’s credibility “shattered” with the court, the court granted the plaintiffs’ motion to exclude the professor’s testimony and denied the AG’s motion to file an amended expert declaration.
Judge Schopf, the judge in Weber who ended up subjecting the chatbot to the Socratic method when the expert’s testimony failed him, concluded that even Copilot—or its developers—“recognize the need for its supervision by a trained human operator” to verify accuracy as well as output:
In what may be an issue of first impression…this court holds that due to the nature of the rapid evolution of [AI] and its inherent reliability issues that prior to evidence being introduced which has been generated by an artificial intelligence product or system, counsel has an affirmative duty to disclose the use of [AI] and the evidence sought to be admitted should properly be subject to a Frye hearing prior to its admission, the scope of which should be determined by the court, either in a pre-trial hearing or at the time the evidence is offered.
Other states quickly took note of Weber, with attorney Pamela Langham telling the Maryland State Bar that the case has “significant implications:” “In a nutshell, an expert utilizing AI technology to render an expert opinion, should be prepared to satisfy the Frye and/or Daubert standards, depending on the jurisdiction.”
Federal Rules of Evidence Proposals
The problem has the potential to become dire. In 2019 and again in 2024, Herbert B. Dixon Jr., senior judge on the District of Columbia Superior Court, explored for the American Bar Association the problem of “deepfake” videos in the courtroom.
The problem was especially alarming following a 2024 incident where a school principal was framed with a racially inflammatory deepfake allegedly orchestrated by a disgruntled staff member.
The state’s attorney in Baltimore County, Maryland, told reporters that the case appeared to be the first of its kind in the country—suggesting that the state legislature may need to update criminal laws, as the offense of “disrupting school activities” hardly fit the bill.
FRE. 901. With regard to evidentiary rules, Dixon pointed to three possibilities, including 1) a proposal by John P. LaMonaga in a 2020 law review article for a new Federal Rule of Evidence (FRE) 901(b)(1). LaMonaga notes that deepfakes “[render] outdated many of the assumptions that the Federal Rules of Evidence are built upon.”
He specifically examines FRE 901, “Authenticating or Identifying Evidence,” providing that a “proponent must produce evidence sufficient to support a finding that the item is what the proponent claims it is.” FRE 901(b)(1) merely states that the requirement is satisfied with testimony that an item is what it is claimed to be”—problematic in the digital age.
LaMonaga suggests a rule providing that before a court admits photographic evidence, a party has the right to request a hearing “requiring the proponent to corroborate the source of information by additional sources.”
The second proposed change is to have the judge decide authenticity, as opposed to a jury. Professor Rebecca Delfino has called for an additional FRE 901(c) where a court must decide any questions about whether the evidence is admissible, following a hearing on the matter.
Another version of an additional proposed FRE 901(c) was presented by Judge Paul W. Grimm and Dr. Maura Grossman to the Advisory Committee on Evidence Rules in April 2024. Grimm and Grossman suggest an additional FRE 901(c), “Potentially Fabricated or Altered Electronic Evidence,” to address “deepfakes”—fabricated or altered photographs, audio recordings, and audio-visual recordings.
Their suggestion provides that if a party challenging authenticity demonstrates that computer-generated or electronic evidence is more likely than not fabricated or altered, the evidence is admissible only if the proponent demonstrates that probative value outweighs the prejudicial effect on the party challenging the evidence.
Separately, the Grimm-Grossman proposal would also update the list of examples of evidence in Rule 901(b), revising Rule 901(b)(9), “Evidence About a Process or System,” to replace “an accurate result” with “a valid and reliable result.”
Further, if the proponent concedes that the item was generated by AI, there must be not only 1) evidence describing the process or system, “showing that it produces a valid and reliable result,” but also 2) additional evidence that (i) describes the software or program that was used; and (ii) shows that it produced valid and reliable results in this instance.”
FRE. 702. The Advisory Committee on Evidence also examined a proposed amendment to Rule 702 to address machine-learning evidence: “The hearsay rule does not work well for machine-based outputs, because machines cannot be cross-examined.”
Among several relevant proposals was a suggestion by Professor Andrea Roth to amend Rule 702, Testimony by Expert Witnesses,” so that if the evidence is the output of a process or system traditionally testified to by a human witness, the proponent must demonstrate to the court that it is more likely than not that
- The output would help the trier of fact to understand the evidence or to determine a fact in issue;
- The output is based on sufficient and pertinent inputs and data, and the opponent has reasonable access to those outputs and data;
- The output is the product of reliable principles and methods; and
- The output reflects a reliable application of the principles and methods to the facts of the case, based on the process or system’s demonstrated reliability under circumstances or conditions substantially similar to those in the case.
Conclusion
Judge Provinzino noted in Kohls that the court did not have a problem with an expert using AI for research purposes:“[B]ut when attorneys and experts abdicate their independent judgment and critical thinking skills in favor of ready made, AI-generated answers, the quality of our legal profession and the court’s decisional process suffer.”
The court thus adds its voice to a growing chorus of courts around the country declaring the same message: verify AI-generated content in legal submissions!
Judge Grimm and Maura Grossman, proponents of the FRE 901 rule change, have said the same thing: “trust but verify.” While enthusiastic about the possibilities, the pair have emphasized that the positive use of AI in the courtroom depends on adequate safeguards.
But while many judges are up to speed on the positive and negative uses of AI in the courtroom—Judge Grimm, for one, has been a trailblazer on electronic discovery, electronic discovery, electronically stored information, and digital media, including AI, for many years—others will need to work hard to keep up.
* * * *
Staff Attorney Ann W. Parks contributed to the preparation of this article.
Reprinted with permission from the April 10, 2025, edition of the “New York Law Journal" © 2025 ALM Global Properties, LLC. All rights reserved. Further duplication without permission is prohibited, contact 877-256-2472 or asset-and-logo-licensing@alm.com