Knowledge Graph Query Answering (KGQA) with Cognee

In today’s data-driven world, traditional databases often struggle to capture the complex, nuanced interactions hidden within unstructured data. Knowledge graphs, however, revolutionize data management by organizing information into interconnected networks of entities and relationships, unveiling hidden insights and enhancing the accuracy of LLM retrievals.

In this post, we’ll examine how modern AI systems harness the power of knowledge graphs to deliver highly personalized, context-aware responses. We’ll also share practical examples that demonstrate how cognee leverages graphs to supercharge advanced querying methods like Retrieval-Augmented Generation (RAG). Finally, we’ll show you how you can start using cognee to create and query knowledge graphs from your own data today.

Knowledge Graphs: the New Paradigm of Data Organization

A knowledge graph is a data structure that organizes information as a network of data points interconnected by the way they relate to each other. This flexible framework allows virtually any kind of information to be represented in a structured and intuitive manner.

The two key components of a knowledge graph are:

Nodes: These represent entities such as people, items, locations, or even abstract ideas.
Edges: These define the relationships or interactions between nodes.

Graph showing cast members from the “From Dusk Till Dawn” movie

Knowledge graphs are not a new concept; their roots can be traced back decades. Their contemporary incarnation was popularized by Tim Berners-Lee, the inventor of the World Wide Web. In 1998, he conceptualized the Semantic Web, also known as Web 3.0, as an extension of the internet which would give computers access to structured sets of information and inference rules, enabling them to conduct automated reasoning.

For an in-depth look at how cognee leverages knowledge graphs to structure and connect data points, check our blog post The Building Blocks of Knowledge Graphs—in it, we break down the core components of knowledge graphs in greater detail and demonstrate how they form the foundation for advanced data retrieval and intelligent search solutions.

Upleveling RAG with Knowledge Graphs

Retrieval-Augmented Generation (RAG) is an approach that leverages search techniques to gather relevant data from external sources, and then uses natural language processing to craft coherent, contextually rich responses.

Traditional RAG systems typically rely on vector stores to retrieve data. This works well in many cases; however, vector-only approaches sometimes miss important details and lack the flexibility required for more complex and nuanced queries.

The Limitations of Traditional RAG

For example, if an AI agent is tasked with helping a customer choose their next pair of shoes, the conversation may go something like this:

Agent: How can I help you?

Customer: I want to buy new sneakers for everyday. I’m looking for something under 100$.

Agent: What color do you like? Do you prefer high-top or regular shoes?

Customer: I would like white ones, or navy blue. I prefer regular shoes.

Agent: Ok, let me see what I can find for you. [PRODUCTS]

New (Customer2) customer’s preferences match with existing customer’s preferences

If the agent is using only a vector store, this interaction will likely result in:

The system filtering products based solely on the provided preferences (e.g., white or navy blue regular sneakers).
A list of generic results, especially when there’s no previous purchase history to refine the recommendations.

The Knowledge Graph Advantage Example #1: Personalization

How can we overcome the challenge of limited data when trying to deliver meaningful recommendations? Or, would maybe the better question be… does it simply seem like we don't have enough information?

By leveraging knowledge graphs, cognee can provide enhanced personalization by matching a new customer with similar users based on shared preferences, thus uncovering patterns that enable more tailored recommendations.

Using Cypher—a query language which allows the retrieval, manipulation, and analysis of graph data using a syntax similar to SQL but optimized for nodes, relationships, and patterns—cognee can execute queries that match new customer preferences with similar user profiles. Here’s an example snippet that finds users with similar preferences:

// Step 1: Use new customers's preferences from input
UNWIND ["White", "Navy Blue", "Regular Sneakers"] AS pref_input

// Step 2: Find other customers who have these preferences
MATCH (other_customer:Customer)-[:has_preference]->(preference:Preference)
  WHERE preference.value = pref_input

WITH other_customer, count(preference) AS similarity_score

// Step 3: Limit to the top-N most similar customers
ORDER BY similarity_score DESC
  LIMIT 5

// Step 4: Get products that these similar customers have purchased
MATCH (other_customer)-[:purchased]->(product:Product)

// Step 5: Rank products based on frequency
RETURN product, count(*) AS recommendation_score
  ORDER BY recommendation_score DESC
  LIMIT 10

This process is known as “Missing Link Prediction”, and it is used for predicting missing or future connections in a graph based on the existing structure and node features. It is commonly used in social networks, recommendation systems, biological networks, and knowledge graphs.

The Knowledge Graph Advantage Example #2: Anomaly Detection

Graphs allow the implementation of rules to maintain data consistency. For example, if a returning customer provides a new shoe size that conflicts with their previously recorded size, the system can detect this inconsistency and prompt for clarification.

Here is how this conversation and the reasoning behind it might go:

Agent: How can I help you today?

User: I need new sneakers size 44, for less than 100$.

Agent: (Saves new customer preferences and detects that the new preference is conflicting with old one) I see that you are asking for size 44, but from our previous conversations I see that you wear size 42. Are you sure that you want size 44 now?

Graph rule says that two shoe size preferences can’t exist for a single user

In this scenario, we already have access to the customer’s data, allowing us to unlock additional, valuable insights. By imposing rules and constraints on our graph, we can ensure that its structure closely mirrors real-world relationships.

For example, since a person cannot wear two different shoe sizes simultaneously, the has_preference relationship between a customer and a shoe size should be unique. We can enforce this rule by checking for conflicting preferences:

// Match the customer and their stored shoe size preference
MATCH (customer:Customer {id: "customer_2"})
OPTIONAL MATCH (customer)-[:has_preference]->(preference:Preference {name: 'ShoeSize'})

// Assume the new shoe size is passed as a parameter $new_size
WITH customer, preference, "43" AS new_size

// If a stored preference exists and it does not match the new value,
// raise an error using APOC's utility procedure.
CALL apoc.util.validate(
  preference IS NOT NULL AND preference.value <> new_size, 
  "Conflicting shoe size preference: existing size is " + preference.value + " and new size is " + new_size, 
  []
)

// If no conflict, continue with the update or further processing
// ...
RETURN customer

This approach, known as “Anomaly Detection,” is used in machine learning to identify nodes, edges, or subgraphs that deviate significantly from normal patterns within a graph. While this example is as simple as they get, in more high-level contexts, detecting harmful outliers might indicate issues like fraud, cyber attacks, fake accounts, or biological irregularities.

The efficiency of anomaly detection can be further enhanced with Graph Neural Networks (GNNs), which we’ll explore in a future blog post.

How Cognee Leverages Graphs for Enhanced Query Answering

At cognee, we use knowledge graphs to store all entities and their connections, whether derived from structured data (e.g., SQL tables) or unstructured data (e.g., text). Graphs not only enable our system to preserve data integrity by enforcing rules—like ensuring a user can’t have two conflicting shoe size preferences—but also let it traverse the information network flexibly, uncovering hidden patterns and meaningful relationships between the extracted data points.

Creating Graphs with Cognee

At cognee, we aim to simplify the creation of graph nodes and edges by abstracting the process with our DataPoint class. Each node in the graph is represented as a DataPoint, making it easy to build, update, and query your knowledge graph. Here’s a quick example demonstrating this approach:

from cognee.low_level import DataPoint

class Products(DataPoint):
    name: str = "Products"

products_aggregator_node = Products()

class Product(DataPoint):
    id: str
    name: str
    type: str
    price: float
    colors: list[str]
    is_type: Products = products_aggregator_node

class Preferences(DataPoint):
    name: str = "Preferences"

preferences_aggregator_node = Preferences()

class Preference(DataPoint):
    id: str
    name: str
    value: str
    is_type: Preferences = preferences_aggregator_node

class Customers(DataPoint):
    name: str = "Customers"

customers_aggregator_node = Customers()

class Customer(DataPoint):
    id: str
    name: str
    has_preference: list[Preference]
    purchased: list[Product]
    liked: list[Product]
    is_type: Customers = customers_aggregator_node

from cognee.low_level import setup
from cognee.tasks.storage import add_data_points

data_points = []

# Create date points from your data
# Full example here: https://github.com/topoteretes/cognee/blob/dev/examples/low_level/product_recommendation.py
...

await add_data_point(data_points)

Querying a Graph with Cognee

Before running a Cypher query, you need to first ensure cognee is configured to use a Neo4j graph database. Once that’s set up, you can query your knowledge graph like this:

from cognee.infrastructure.databases.graph import get_graph_engine

# Get the graph engine from cognee (neo4j)
graph_engine = await get_graph_engine()

# Query products from similar customers
products_results = await graph_engine.query(
    """
    // Step 1: Use new customers's preferences from input
    UNWIND $preferences AS pref_input

    // Step 2: Find other customers who have these preferences
    MATCH (other_customer:Customer)-[:has_preference]->(preference:Preference)
      WHERE preference.value = pref_input

    WITH other_customer, count(preference) AS similarity_score

    // Step 3: Limit to the top-N most similar customers
    ORDER BY similarity_score DESC
      LIMIT 5

    // Step 4: Get products that these similar customers have purchased
    MATCH (other_customer)-[:purchased]->(product:Product)

    // Step 5: Rank products based on frequency
    RETURN product, count(*) AS recommendation_score
      ORDER BY recommendation_score DESC
      LIMIT 10
""",
    {
        "preferences": ["White", "Navy Blue", "Regular Sneakers"],
    },
)

print("Top 10 recommended products:")
for result in products_results:
    print(f"{result['product']['id']}: {result['product']['name']}")

# Detect the anomaly in new data points
try:
		await graph_engine.query(
				"""
				// Match the customer and their stored shoe size preference
				MATCH (customer:Customer {id: $customer_id})
				OPTIONAL MATCH (customer)-[:has_preference]->(preference:Preference {name: 'ShoeSize'})

				// Assume the new shoe size is passed as a parameter $new_size
				WITH customer, preference, $new_size AS new_size

				// If a stored preference exists and it does not match the new value,
				// raise an error using APOC's utility procedure.
				CALL apoc.util.validate(
				  preference IS NOT NULL AND preference.value <> new_size, 
				  "Conflicting shoe size preference: existing size is " + preference.value + " and new size is " + new_size, 
				  []
				)

				// If no conflict, continue with the update or further processing
				// ...
				RETURN customer
	  """,
	      {
	          "customer_id": "customer_1",
	          "new_size": "42",
	      },
	  )
except Exception as error:
    print(f"Anomaly detected: {str(error.message)}")

From DataPoints to Dynamic Insights: The Future Is Graph-Based

As you can probably tell by now, knowledge graphs are more than just a novel way to organize data—they are a powerful tool for uncovering hidden insights and driving intelligent decision-making. By leveraging pipelines and tasks, cognee makes it possible to integrate diverse data sources, enforce real-world constraints, and deliver highly personalized, context-aware recommendations.

However, the two examples we’ve covered in this post—personalized recommendations and anomaly detection—are just the tip of the iceberg. There’s immense potential in harnessing graph technology to transform the way AI interacts with data. We’re continuously refining our approach and developing new features to make graph-based querying even more intuitive and scalable.

If you have a use case or idea you’d like to share, we’d love to hear from you! Book a call with us or join our Discord community to be part of the conversation and see how cognee is paving the way for the next generation of intelligent data solutions.

Join the cognee community to hear about new releases, use cases, and all the things we're working on.

From the blog

Deep Dives

Going beyond Langchain + Weaviate: Level 2 towards ProductionLevel 2 AI integrates Memory Layer, FastAPI, Langchain & Weaviate. Our POC enables PDF uploads, translations & smart data retrieval. Try it with us now!

Querying Relational Databases with LLMs (Not Text-to-SQL)See how cognee transforms relational databases into a knowledge graph that you can query using LLMs—unlock your data’s power with cognee, start the migration now!

The Building Blocks of Knowledge Graphs: a Look at Cognee's ApproachExplore knowledge graph's role in data storage, semantic search, and reasoning, plus real-world AI applications and enterprise knowledge management. Try with cognee!

9 mins read

Hande Kafkas

Feb 10, 2025