Retrieval augmented coding assistants

Sep 27, 2023

AI coding assistants like GitHub Copilot, Cursor, Replit Ghostwriter, or ChatGPT can significantly boost productivity for certain coding tasks. When I'm writing TypeScript, they’ll often suggest small helper functions I need and implement them in line. pickBy, deeplyMapValues, or slugify are examples, where coding assistants sometimes generate the whole function implementation, although a library is available.

What’s the issue with that code? If it works and gets the job done, why bother?

Lack of Context and Documentation: I'm unsure why the code was written like that. It lacks proper documentation.
Unreliable Testing: Is it covering the edge cases? Does it come from a well-tested library or from a beginner project with many mistakes? Does it cause unwanted crashes?
Performance Uncertainty: It's unclear how the code performs and whether further improvements are possible.
Security Concerns: The code might be insecure and invite new or old exploits

All these issues can be solved by using as many well-tested libraries as possible. Millions of libraries in the various package managers are constantly being tested for correctness, security, and performance by the community. While coding assistants still need to write some glue code to make it work with the existing codebase, it would dramatically reduce the attack surface, reducing the mental overhead of incorporating the code into the codebase, if coding assistants just focus on glue code over rewriting libraries as much as possible.

There are challenges with that, though: code assistant LLMs only know libraries until the cutoff date of training. In GPT-4's case, it’s 2021 - ages in software development. It happened to me multiple times that GPT-4 would suggest an outdated library or breaking API changes.

It can usually be fixed by feeding the latest docs and debugging this together with GPT-4 through chain-of-thought reasoning, but it is a cumbersome journey.

In addition, I'd like to have a say in the library being used. Choosing the right library is an art and a science. Assessing the health of the open source project on its own is hard, and while GitHub stars, number of issues, number of releases, and quality of documentation can be heuristics to check for the health of a project, ultimately, the question is, if the library has high-quality code, is safe, fast and correct. I still want to look at it and assess whether I want to add it to the codebase before the code is generated.

What if we solve this by augmenting the LLM with being able to retrieve all the latest packages in the different languages, indexed into a vector store? It would allow us to always use the latest libraries and choose the right ones for the job. That would require a huge vector index and some innovations in the code assistant UX front, being able to choose different implementation options with different tradeoffs before the code is added to our codebase. Who would build this? An API from the package managers? Coding assistant providers?

While there's a lot of money and effort going into creating the latest coding LLMs (starcoder, code llama, replit-code), more work has to go into augmenting code assistants through retrieval. Some problems simply can’t be solved by building a better model.

Tim Suchanek

Discussion about this post