Conductor: From RAG to Riches

Stephen Meriwether
3 min readNov 14, 2024

There’s been a lot of conversation around building effective Retrieval-Augmented Generation (RAG) systems. While most discussions about RAG focus on strategies and limitations of pulling text from documents, with Stride Conductor, we’re tackling a different challenge: locating relevant code snippets based on user stories.

Here’s how we’ve approached this problem and found a solution that works.

Our First Attempt: Vectorize Everything

We started by borrowing a common strategy from document search solutions:

  1. Vectorize the User Story: We converted the user’s requirements into numerical vectors.
  2. Vectorize the Codebase: We did the same with our entire code repository.
  3. Search for Matches: We compared these vectors to find code that aligned with the user story.

On paper, this seemed like a solid plan. But in practice, the results were mediocre. More often than not, we didn’t find the relevant code we were looking for. Even when we tried more advanced embedding models, the improvements were minimal.

A New Direction: Leveraging LLMs

Realizing we needed a different approach, we turned to the pattern-matching strengths of Large Language Models (LLMs). Here’s what we did:

  • Abstracting the Codebase: Using tools like tree-sitter, we created an abstract representation of a codebase. We focused on the class and method signatures, structures, and patterns in the code rather than the specific details.
  • Consulting the LLM: We asked the LLM to determine the correct entry points in the code for a given user story. Since LLMs are great at understanding patterns and context, this played to their strengths.
  • Self-Grading Mechanism: After the LLM made its suggestions, we took those abstract code representations and replaced them with the actual code from the files it pointed out. Then, we asked the LLM to “grade” its own suggestions. This grading helped us decide which code snippets to keep and which to ignore.
  • Assessing Sufficiency: The grading also told us if we had enough relevant code context to proceed or if we needed to dig deeper.

The process is called “Corrective RAG” because it allows the system to correct and refine its own outputs.

The Results: A Significant Improvement

By using this new method, we saw a dramatic improvement over our initial approach. Our internal evaluations showed that we were finding the right code context much more consistently.

Why This Matters

Stride Conductor takes care of writing tests so teams can focus more time on delivering user value. In order to write great tests, Conductor needs the right context. Every improvement we make in Conductor’s RAG directly leads to better tests for our customers.

Curious about how Stride Conductor can help you accelerate delivery by automating high-quality test generation? Book a personalized demo here

--

--

No responses yet