Blog>Deep Dives

Enhancing LLM Responses with Graph-Based Retrieval and Advanced Chunking Techniques

The emergence of LLMs and LLM-based applications has become the cornerstone of modern AI.

With the launch of ChatGPT, we got the first platform that enabled a computer to produce human-like responses. GPT not only passed but smashed the Turing test, long held to be the ultimate standard for artificial intelligence.

Although its general knowledge base is vast, ChatGPT fails to provide reliable retrievals in certain use cases; an example would be a query for which company data-specific knowledge is needed.

The technique that emerged to address this issue is called retrieval augmented generation, or RAG. It relies on retrieving relevant information, then passing it to the LLM via the prompt so that its answer will take that information into account.

The implementation of RAG has resulted in much more relevant responses to queries within specific contexts. However, it also raises questions, such as how relevant pieces of information are identified and what constitutes an ascertainable “unit” of information in the first place.

Retrieving Relevance with Knowledge Graphs

A number of startups/platforms have emerged to try to figure out the answers to these questions.

We here at cognee believe that enhancing AI apps and agents with a semantic layer can enable the LLM to understand which pieces of data truly matter for a specific query.

By organizing the available information into a graph structure and superimposing this structure on traditional embedding representations, LLMs can identify the relevant facts by analyzing how similar the embeddings are and how the elements are connected within the graph.

The general approach to using graph-based retrieval for RAG, as well as Microsoft’s eponymous framework leveraging this method, is called GraphRAG. While the launch of cognee preceded that of GraphRAG, this field expanded rapidly after Microsoft released their platform in June of 2024.

Identifying Information Units with Chunking

While the question of what makes up a unit of information generally receives less attention, it is as worth addressing as the question of how relevant information is retrieved.

The technical term for the operation that identifies units of information is “chunking”—it consists of processing textual input into a distinct number of data packets.

A very simple example of chunking would be splitting a text into a certain number of paragraphs using a function like two newline characters (“\n\n”). Each resulting paragraph would then be considered one “chunk”, and these chunks would be retrievable units that can be passed on to a LLM.

Production-Level Bites: Can Chunkers Chew Them?

Chunking can work well for documents containing plain text with relatively short and evenly spaced paragraphs and no unconventional formatting.

When developing production-level RAG systems, however, the range of data files that can be ingested is vast: from Word documents and PowerPoint presentations to emails, meeting notes, and Confluence pages to code repositories and various forms of technical documentation.

The range in any system processing large volumes of data in a dynamic environment will likely be even wider. To get an idea of the complexity of such a task, we will look at two libraries that supply text chunkers that are commonly used in GenAI: LangChain and LlamaIndex.

Both of these libraries focus on plain text, HTML, and code inputs and provide specialized chunkers for some of these formats; other input types would require conversion before chunking.

In the following sections, we will provide a rundown on how to perform the chunking operation and compare several performance indicators of LangChain and LlamaIndex chunkers with those of the chunking function implemented within cognee.

Initializing the Chunkers

One frequently used chunker by LangChain is the RecursiveCharacterTextSplitter. It is initialized like this:

image1

And called like this:

image2

The parameters chunk_size and chunk_overlap are set to the indicated numbers to make the outputs as comparable as possible to the default cognee chunker.

A common LlamaIndex chunker is the SentenceSplitter, which is initialized like this:

image3

And called like this:

image4

Again, chunk_size and chunk_overlap are set to create similar output to the cognee chunker.

The cognee default chunker with default parameters (paragraph_length=1024 and batch_paragraphs = True) is not a class but a function, and can be called like this:

image5

Unlike the LlamaIndex and LangChain chunkers and their split_text methods, chunk_by_paragraph returns a generator function.

Iterating over it allows each chunk to be processed separately and then “forgotten” before the next one is generated from the text. This results in a significantly lower memory footprint during chunking and the processing of arbitrarily large files.

Benchmarking Results

Ideally, processing should be fast, memory-efficient, and scalable. We will assess each of the three platforms’ performance metrics using text from The Complete Works of William Shakespeare as base input data.

A sample of the data looks like this:

“Those hours that with gentle work did frame

The lovely gaze where every eye doth dwell

Will play the tyrants to the very same…”

Speed

Across speed metrics, we see that the LangChain RecursiveCharacterTextSplitter is by far the fastest, roughly 30x faster than the LlamaIndex SentenceSplitter, which is, in turn, 2-3x faster than cognee chunk_by_paragraph.

image6

Memory Footprint

When collecting the results, the memory footprints of the LangChain RecursiveCharacterTextSplitter and the cognee chunk_by_paragraph are very similar, at roughly 4 times the input data, while for the LlamaIndex SentenceSplitter, the memory footprint is roughly 10x the input data size.

Importantly, if we do not collect the results generated by chunk_by_paragraph, cognee’s memory footprint is minimal (around 0.1 MB) and constant, independent of the size of the input.

image7

Robustness to Unconventional Input

Now, let’s evaluate chunker performance if the input text is not in English and is formatted unconventionally. To this end, we have generated a file with text consisting of random characters interspersed with spaces and newline characters. Here’s a snippet:

“ibTR2a7g203LtSYZReI h XyFJjFdj 3JGfYldpD9VYAWB svBtSEteW 1ov0  gIzI z4ukn9x7NxhdKLO2EdppKeLc  BsuzrC1ueIDPlw38iiYro53gUqPk  12ZKbAsfbmzuR odRNgc7c

tkUT5kZ K3K…”

The processing performance of the LlamaIndex SentenceSplitter degrades significantly, with higher-than-linear scaling of compute time with input size. This suggests that the LlamaIndex SentenceSplitter is less robust to unconventional inputs and has a less predictable performance profile.

The memory requirements metrics, on the other hand, did not change significantly, so the graph is omitted here.

image8

The Need for Invertibility in Chunking for GraphRAG

In addition to the usual embedding representation, cognee also represents the chunks extracted from the input files as nodes in a graph database. Graph retrieval methods allow collating the text values of adjoining nodes to reconstruct a larger slice of the original input.

To do this truthfully, however, it is necessary to preserve every character of the input string representation in one of the chunks that are generated from it and to prevent any duplicate characters between chunks. In other words, the chunking process must be invertible.

This property is much more salient in GraphRAG than in conventional RAG. Currently, standard chunkers, such as RecursiveCharacterTextSplitter or SequenceSplitter in common configurations, aren’t invertible.

Cognee’s chunk_by_paragraph, on the other hand, has this property and is, therefore, well suited to GraphRAG applications.

To summarize the results, the following table shows the advantages and disadvantages of each chunking approach:

Chunker Fast Memory-efficient Invertible Resilient to non-standard inputs
Cognee chunk_by_paragraph
LangChain RecursiveCharacterTextSplitter 🟡
LlamaIndex SequenceSplitter

Toward Smarter Information Retrieval

The rapid evolution of LLMs signals a need for smarter, more efficient ways to manage and retrieve more relevant information. While traditional RAG methods rely on robust chunking to process data, the issues of scalability, memory efficiency, and handling unconventional inputs demand innovative solutions.

The benchmarking results from this test show us that each chunking method has its strengths and trade-offs. However, cognee’s approach, which integrates invertible chunking with graph-based retrieval, represents a significant step forward in addressing these complex challenges.

As the AI field further develops, GraphRAG will become integral in building more intelligent, scalable, and reliable systems for data-driven AI applications. The path forward will be paved by the integration of advanced retrieval methods with conventional chunking techniques.

This synergy will not only enhance the efficiency and accuracy of AI systems but also enable them to handle increasingly complex and diverse datasets, setting a new standard for intelligent data processing and retrieval.