Blog>Deep Dives

Memory Fragment Projection: Building a Personalized Knowledge Graph Layer with Cognee

Projecting entity layers, communities, or chunk neighborhoods during retrieval breaks the chains of pre-defined methods implemented by conventional graph databases and libraries.

With cognee, we can define the relevant graph context as an in-memory fragment and focus on the most critical connections between the nodes. Keeping the algorithm flexible enables us to explore relevant local graph layers and subnetworks in a more personalized and refined way.

Cognee is an open-source library that supports creating your own AI memory engine in an easy and scalable way. One of the main ideas behind the library is to give you an efficient plug-and-play semantic layer between you and your knowledge base.

Cognee’s engine breaks down the knowledge graph into subsets of informational entities and their connections, then extracts a simple and lean graph layer that functions as a projected memory fragment, enabling more sophisticated subgraph exploration.

In this article, we will explore a use case where we identify communities in the network and project them into in-memory graph fragments to support our GraphRAG pipeline.

Strategies for Filtering Relevant Subgraphs

Once a knowledge graph reaches a certain size, non-parametric retriever methods that are usually based on heuristic rules and traditional graph search algorithms can become inefficient.

The intention of these standard approaches is to connect relevant network elements using strategies such as shortest paths or price-collecting Steiner trees or by extracting any other structure representing the relevant connections between the nodes.

The size of the knowledge graph also limits the set of tools we can use during retrieval. As it becomes unfeasible to run a computationally expensive algorithm on the whole network, we need to narrow its focus on smaller relevant graph subsets. Applying a filtering method like this, of course, comes with certain tradeoffs.

In the table below, we provide an overview of some of the subgraph filtering approaches we can use to identify the relevant layers or subgraphs of our knowledge base and some of their potential disadvantages:

Filtering Method Function Tradeoffs
Community-based Extracting only communities (dense subnetworks) that contain the relevant nodes The projected graph can lose bridge nodes and connections outside a relevant community
Neighborhood-based Projecting n-level deep neighborhoods starting from relevant nodes Connecting paths that are longer than 2n can be lost in the projected graph
Entity-based Extracting given entity layers from the graph Other non-relevant layers can be lost

There are many other filtering techniques based on the developer’s need and the retrieval technique used. Our objective during this phase is to set a filtering rule that identifies the query-relevant medium-sized structures that are small enough to be analyzed in memory but too big to be provided as retrieved context to the LLM agent.

In the following section, we’ll define a graph data structure, detect communities in Neo4j, then project the subgraphs of our relevant chunks that we explained deeper in here.

Boosting Neo4j’s Capabilities with the Cognify Pipeline

First, we need a Neo4j database with the graph-data-science package. Our main goal is to use cognee to create and enrich a knowledge graph from raw textual data retrieved by the LLM and then detect communities in the resulting structure with Neo4j.

The first step is hosting a docker container on our local machine:

image1

We will use a Cognee-generated knowledge graph from one of the examples on our GitHub (Note: the knowledge graph structure and the example may change in the future).

As you can see in our GitHub notebook, while processing raw documents, the Cognee library:

After processing the raw text with Cognee, the knowledge graph in Neo4j should look something like this:

image2

Community Detection in Neo4j [optional]

This step is optional. You can detect communities with Neo4j or use any other approach. The main condition before proceeding to the graph fragment projection is that the nodes contain their community IDs or relevant filter information as a property.

Before starting the detection, we have to create an in-memory graph instance for Neo4j. This can be done using the following command:

image3

Since Neo4j computes the global information for the whole network, we used a wildcard pattern to store all edges and nodes in the memory.You can also try out different projections which keep only the selected nodes and edges in the memory.

The nature of this step, especially the structure of the projected graph, depends on your actual use case.

Our next step is to detect communities in the network. To do that, we will use the built-in Louvain modularity maximization method:

image4

The communities will appear on each node as a new property. We got five communities, which is not surprising given that we added five different documents to our knowledge graph.

This Cypher query will show us how many nodes we have in each community:

image5

In the following table, we can see that we have the most nodes in communities 4 and 6, while community 63 has the fewest nodes. Examining the graph further, we can find which community corresponds to which original document and identify which documents have stronger connections on the entity layer.

community members
4 22
6 14
0 13
2 12
63 9

Before we proceed, we have to delete the neo4j in-memory graph, as we are not using it anymore. This function enables us to do so:

image7

Graph Data Structure and Memory Fragment Projection

Let’s summarize what we have by now:

The next step is to define our graph data structure and implement the memory fragment (subgraph) projection. The basic node and edge structures are as follows (due to the length of the implementation, we’ll only include the attributes of node and edge classes):

image8

Each elementary building block contains additional methods, hash definitions, and other necessary implementations. Furthermore, the Node and Edge building blocks are encapsulated in a CogneeGraph class that implements the async memory projection feature that allows us to project a skeleton image of the graph from the datastore. See the current state of the full implementation on our Github.

The implementation itself is relatively simple. It handles a dynamic number of attributes, with both nodes and edges prepared to be extended as multidimensional entities in a one-dimensional vector which represents their state (alive/dead).

The following code snippet shows the memory projection feature:

image9

The graph projection is an async method that collects the relevant nodes and their connections from the database based on its parameters. To understand how to use it, let’s project the two most significant communities from the network and keep only the node IDs and their community labels.

image10

When we project the graph, we have to pass the graph engine to the method, define which node and edge properties we need, whether we want a directed or non-directed graph, the dimensions of node and edge entities, and, finally, the property and its values to project.

As a result, we get 36 nodes (the sum of the nodes from the first two communities in the previous table) and 63 in community edges. Naturally, communities can be picked based on where our relevant chunks are, or on other decision criteria.

At the end of the process, we extracted a filtered memory fragment from our long-term Neo4j memory into the Cognee graph data structure. From this step, the path is clear to implement advanced explorative algorithms that were previously unviable due to the graph size or knowledge base limitations.

Takeaways: Knowledge Graph Projection with Cognee

In cases when we want to explore our knowledge graph in more detail, retrieving subgraphs as memory fragments can enable us to focus only on the relevant subsets of the graph.

By memorizing fragments and their local connections, we can limit the search space during the optimization of retrieved structures whenever we collect triplets, subgraphs, or hybrid solutions during retrieval.

Leveraging the power of GraphRAG, cognee enables the identification of relevant subgraphs or graph layers and their ingestion into a structure that provides flexibility and a limited and filtered view of a specific data fragment.

This approach allows us to explore the full potential of our knowledge base, enabling efficient, context-aware search of our knowledge graphs, focusing on what matters while keeping non-important subgraphs untouched.