Precision Under the Microscope – AI vs Human Accuracy in Contract Analysis
Accuracy in legal contract review isn’t optional; it’s foundational. One misinterpreted clause or missed obligation can trigger non-compliance, fines, or strained supplier relationships. So, how do AI models stack up against human lawyers when it comes to precision?
According to a landmark benchmarking study published by Martin, Whitehouse, and Yiu et al., Legal Process Outsourcers (LPOs) achieved an F-score of 0.77 when identifying legal issues in contracts. The leading LLM at the time, GPT-4-32k, was close behind with an F-score of 0.74 — a negligible difference in practical terms.
Ask Brooklyn, powered by Claude 3.7 Sonnet, has since raised the bar. In vendor contract benchmarks, it achieved an 8% improvement in identifying legal concepts over previous Anthropic models, with a cumulative 87.5% increase in accuracy since March 2024.
Beyond raw accuracy, AI models excel at consistency. Unlike humans, they don’t suffer from fatigue or cognitive bias — two common pitfalls in high-volume legal reviews. As noted by Stanford Law School’s CodeX report, AI can maintain precision across thousands of documents, regardless of length or complexity.
This reliability, however, comes with an important caveat: AI is most effective when guided by humans. A recent report by PwC underscores that the best outcomes in legal workflows come from hybrid approaches, where AI tools surface data, and legal professionals apply judgment.
In Brooklyn’s implementation, Ask Brooklyn is trained to extract, summarise, and explain clauses, but always in a way that empowers human users to make final decisions.
Want to be the new Champ in the ring?
Curious about how Ask Brooklyn handles legal clause interpretation and accuracy compared to human experts? Download our full whitepaper to see the full benchmark data.