Generative AI Benchmark: Increasing the Accuracy of LLMs in the Enterprise with a Knowledge Graph

jrineakter · Post by **jrineakter** » Tue Feb 11, 2025 5:00 am

A new benchmark, published by the authors, demonstrates further enhancements to response accuracy when using knowledge graphs with LLMs. Click here to learn more about the new benchmark.

Large Language Models (LLMs) present enterprises with exciting new opportunities for leveraging their data, from improving processes to creating entirely new products and services. But coupled with excitement for the transformative power of LLMs, there are also looming concerns. Chief among those concerns is the accuracy of LLMs in production. Initial evaluations have found that LLMs will surface false information backed by fabricated citations as fact – also known as “hallucinations.” This phenomenon led McKinsey to cite “inaccuracy” as the top risk associated with generative AI.

GenAI Benchmark: Question Answering Accuracy on Enterprise SQL Databases
Download this first-of-its-kind LLM benchmark report to discover how a Knowledge Graph can improve LLM accuracy by 300% and give your enterprise a competitive edge.

Learn more
Some context around hallucinations: LLMs function as statistical pattern-matching systems. They analyze vast quantities of data to generate responses based on statistical greece whatsapp number data likelihood – not fact. Therefore, the smaller the dataset – say your organization’s internal data rather than the open internet – the less likely it is that the responses are accurate.

However, research is underway to address this challenge. A growing number of experts from across the industry, including academia, database companies, and industry analyst firms, like Gartner, point to Knowledge Graphs as a means for improving LLM response accuracy.

To evaluate this claim, a new benchmark from Juan Sequeda Ph.D., Dean Allemang Ph.D., and Bryon Jacob, CTO of data.world, examines the positive effects that a Knowledge Graph can have on LLM response accuracy in the enterprise. They compared LLM-generated answers to answers backed by a Knowledge Graph, via data stored in a SQL database. The benchmark found evidence of a significant improvement in the accuracy of responses when backed by a Knowledge Graph, in every tested category.

Specifically, top-line findings include:

A Knowledge Graph improved LLM response accuracy by 3x across 43 business questions.

LLMs – without the support of a Knowledge Graph – fail to accurately answer “schema-intensive” questions (questions often focused on metrics & KPIs and strategic planning). LLMs returned accurate responses 0% of the time.

A Knowledge Graph significantly improves the accuracy of LLM responses – even schema-intensive questions.

A comparison: Answering complex business questions
The benchmark uses the enterprise SQL schema from the OMG Property and Casualty Data Model in the insurance domain. The OMG specification addresses the data management needs of the Property and Casualty (P&C) insurance community. Researchers measured accuracy with the metric of Execution Accuracy (EA) from the Yale Spider benchmark.

Against this metric, the benchmark compared the accuracy of responses to 43 questions of varying complexity, ranging from simple operational reporting to key performance indicators (KPIs).

The benchmark applies two complexity vectors: question complexity and schema complexity.

Question complexity: Refers to the amount of aggregations, mathematical functions, and the number of table joins required to produce a response.

Schema complexity: Refers to the amount of different data tables that must be queried in order to produce a response.

These vectors create four categories of question complexity related to typical analysis that occurs in an enterprise environment: