Memory Fragment Projection: Building a Personalized Knowledge Graph Layer with Cognee
Projecting entity layers, communities, or chunk neighborhoods during retrieval breaks the chains of pre-defined methods implemented by conventional graph databases and libraries.
With cognee, we can define the relevant graph context as an in-memory fragment and focus on the most critical connections between the nodes. Keeping the algorithm flexible enables us to explore relevant local graph layers and subnetworks in a more personalized and refined way.
Cognee is an open-source library that supports creating your own AI memory engine in an easy and scalable way. One of the main ideas behind the library is to give you an efficient plug-and-play semantic layer between you and your knowledge base.
Cognee’s engine breaks down the knowledge graph into subsets of informational entities and their connections, then extracts a simple and lean graph layer that functions as a projected memory fragment, enabling more sophisticated subgraph exploration.
In this article, we will explore a use case where we identify communities in the network and project them into in-memory graph fragments to support our GraphRAG pipeline.
Strategies for Filtering Relevant Subgraphs
Once a knowledge graph reaches a certain size, non-parametric retriever methods that are usually based on heuristic rules and traditional graph search algorithms can become inefficient.
The intention of these standard approaches is to connect relevant network elements using strategies such as shortest paths or price-collecting Steiner trees or by extracting any other structure representing the relevant connections between the nodes.
The size of the knowledge graph also limits the set of tools we can use during retrieval. As it becomes unfeasible to run a computationally expensive algorithm on the whole network, we need to narrow its focus on smaller relevant graph subsets. Applying a filtering method like this, of course, comes with certain tradeoffs.
In the table below, we provide an overview of some of the subgraph filtering approaches we can use to identify the relevant layers or subgraphs of our knowledge base and some of their potential disadvantages:
Filtering Method | Function | Tradeoffs |
---|---|---|
Community-based | Extracting only communities (dense subnetworks) that contain the relevant nodes | The projected graph can lose bridge nodes and connections outside a relevant community |
Neighborhood-based | Projecting n-level deep neighborhoods starting from relevant nodes | Connecting paths that are longer than 2n can be lost in the projected graph |
Entity-based | Extracting given entity layers from the graph | Other non-relevant layers can be lost |
There are many other filtering techniques based on the developer’s need and the retrieval technique used. Our objective during this phase is to set a filtering rule that identifies the query-relevant medium-sized structures that are small enough to be analyzed in memory but too big to be provided as retrieved context to the LLM agent.
In the following section, we’ll define a graph data structure, detect communities in Neo4j, then project the subgraphs of our relevant chunks that we explained deeper in here.
Boosting Neo4j’s Capabilities with the Cognify Pipeline
First, we need a Neo4j database with the graph-data-science package. Our main goal is to use cognee to create and enrich a knowledge graph from raw textual data retrieved by the LLM and then detect communities in the resulting structure with Neo4j.
The first step is hosting a docker container on our local machine:
We will use a Cognee-generated knowledge graph from one of the examples on our GitHub (Note: the knowledge graph structure and the example may change in the future).
As you can see in our GitHub notebook, while processing raw documents, the Cognee library:
- Populates vector and graph databases and prepares them to support RAG pipelines, and
- Uses an LLM to enrich the knowledge graph by creating new nodes corresponding to entities and summaries.
After processing the raw text with Cognee, the knowledge graph in Neo4j should look something like this:
Community Detection in Neo4j [optional]
This step is optional. You can detect communities with Neo4j or use any other approach. The main condition before proceeding to the graph fragment projection is that the nodes contain their community IDs or relevant filter information as a property.
Before starting the detection, we have to create an in-memory graph instance for Neo4j. This can be done using the following command:
Since Neo4j computes the global information for the whole network, we used a wildcard pattern to store all edges and nodes in the memory.You can also try out different projections which keep only the selected nodes and edges in the memory.
The nature of this step, especially the structure of the projected graph, depends on your actual use case.
Our next step is to detect communities in the network. To do that, we will use the built-in Louvain modularity maximization method:
The communities will appear on each node as a new property. We got five communities, which is not surprising given that we added five different documents to our knowledge graph.
This Cypher query will show us how many nodes we have in each community:
In the following table, we can see that we have the most nodes in communities 4 and 6, while community 63 has the fewest nodes. Examining the graph further, we can find which community corresponds to which original document and identify which documents have stronger connections on the entity layer.
community | members |
---|---|
4 | 22 |
6 | 14 |
0 | 13 |
2 | 12 |
63 | 9 |
Before we proceed, we have to delete the neo4j in-memory graph, as we are not using it anymore. This function enables us to do so:
Graph Data Structure and Memory Fragment Projection
Let’s summarize what we have by now:
- A knowledge graph based on our documents stored in Neo4j
- Properties on each node that separate them into different subsets (in this specific case, “communities”).
The next step is to define our graph data structure and implement the memory fragment (subgraph) projection. The basic node and edge structures are as follows (due to the length of the implementation, we’ll only include the attributes of node and edge classes):
Each elementary building block contains additional methods, hash definitions, and other necessary implementations. Furthermore, the Node and Edge building blocks are encapsulated in a CogneeGraph class that implements the async memory projection feature that allows us to project a skeleton image of the graph from the datastore. See the current state of the full implementation on our Github.
The implementation itself is relatively simple. It handles a dynamic number of attributes, with both nodes and edges prepared to be extended as multidimensional entities in a one-dimensional vector which represents their state (alive/dead).
The following code snippet shows the memory projection feature:
The graph projection is an async method that collects the relevant nodes and their connections from the database based on its parameters. To understand how to use it, let’s project the two most significant communities from the network and keep only the node IDs and their community labels.
When we project the graph, we have to pass the graph engine to the method, define which node and edge properties we need, whether we want a directed or non-directed graph, the dimensions of node and edge entities, and, finally, the property and its values to project.
As a result, we get 36 nodes (the sum of the nodes from the first two communities in the previous table) and 63 in community edges. Naturally, communities can be picked based on where our relevant chunks are, or on other decision criteria.
At the end of the process, we extracted a filtered memory fragment from our long-term Neo4j memory into the Cognee graph data structure. From this step, the path is clear to implement advanced explorative algorithms that were previously unviable due to the graph size or knowledge base limitations.
Takeaways: Knowledge Graph Projection with Cognee
In cases when we want to explore our knowledge graph in more detail, retrieving subgraphs as memory fragments can enable us to focus only on the relevant subsets of the graph.
By memorizing fragments and their local connections, we can limit the search space during the optimization of retrieved structures whenever we collect triplets, subgraphs, or hybrid solutions during retrieval.
Leveraging the power of GraphRAG, cognee enables the identification of relevant subgraphs or graph layers and their ingestion into a structure that provides flexibility and a limited and filtered view of a specific data fragment.
This approach allows us to explore the full potential of our knowledge base, enabling efficient, context-aware search of our knowledge graphs, focusing on what matters while keeping non-important subgraphs untouched.