Going beyond Langchain + Weaviate: Level 2 towards Production

1.1. The problem of putting code to production

This post is a part of a series of texts aiming to discover and understand patterns and practices that would enable building a production-ready AI data infrastructure. The main focus is on how to evolve data modeling and retrieval in order to enable Large Language Model (LLM) apps and Agents to serve millions of users concurrently.

For a broad overview of the problem and our understanding of the current state of the LLM landscape, check out our previous post

infographic

In this text, we continue our inquiry into what would constitute:

  1. Proper data engineering methods for LLMs
  2. A production-ready generative AI data platform that unlocks AI assistants/Agent Networks

To explore these points, we here at prometh.ai have partnered with dlthub in order to productionize a common use case — complex PDF processing — progressing level by level.

In the previous text, we wrote a simple script that relies on the Weaviate Vector database to turn unstructured data into structured data and help us make sense of it.

In this post, some of the shortcomings from the previous level will be addressed, including::

  1. Containerization
  2. Data model
  3. Data contract
  4. Vector Database retrieval strategies
  5. LLM context and task generation
  6. Dynamic Agent behavior and Agent tooling

3. Level 2: Memory Layer + FastAPI + Langchain + Weaviate

3.1. Developer Intent at Level 2

This phase enhances the basic script by incorporating:

3.2 Toward the memory layer - POC at level 2

Untitled

At this stage, our proof of concept (POC) allows uploading a PDF document and requesting specific actions on it such as "load to database", "translate to German", or "convert to JSON." Prior task resolutions and potential operations are assessed by the Context Manager and Task Manager services.

The following set of steps explains the workflow of the POC at level 2:

Untitled

carbon (19).png

We have implemented many more, and you can find them in our

repository. More are still needed and contributions are more than welcome.

Let’s see the modulators in action:

carbon (20).png

In the code above we fetch the memories from the Semantic Memory bank where our knowledge of the world is stored (the PDFs). We select the relevant documents by using the handle_modulator function.

carbon (21).png

We process the data retrieved with OpenAI functions and store the results for the Task Manager to be able to determine what actions the Agent should take.

The Task Manager then sorts and converts user input into a set of actionable steps based on the tools available.

carbon (22).png

Finally, the Agent interprets the context and performs the steps using the tools it has available. We see this as the step where the Agents take over the task, executing it in their own way.

Now, let's look back at what constitutes the Data Platform:

Memory type State Description
Sensory Memory API Can be interpreted in this context as the interface used for the human input
STM Weaviate Class with hardcoded contract The processing layer and a storage of the session/user context
LTM Weaviate Class with hardcoded contract The information storage

Lacks:

Next Steps:

  1. Implement different strategies for vector search
  2. Add more tools to process PDFs
  3. Add more attention modulators
  4. Add a solid test framework

Conclusion

If you enjoy the content or want to try out cognee please check out the github and give us a star!