Start Embracing the Effectiveness of LLMs for Programming

10 Dec, 2024

The shift towards LLM-assisted coding in our industry is now undeniable. Yet in many circles, it's still dismissed as merely a "probabilistic boilerplate machine." This perspective seems remarkably conservative, as if we've already perfected software development and have no need to explore new approaches. While I don't really care about credentials, I've spent 25 years as a developer (obviously without using LLMs) and after two years experience with LLM-assisted coding, I'm certain that this technology represents something far more profound than just randomly inserting AI-generated code into existing codebases.

Documentation-First Development

Here's my workshop about programming with LLMs which hopefully gives some more concrete insights into the techniques I'm talking about:

LLMs for the working programmer - Handout

LLMs for the working programmer - Youtube Recording

What I've learned is that effectively leveraging LLMs requires a fundamental shift in how we approach software development. A crystal-clear understanding of what you want to build, expressed in detailed written form, is pretty much a requirement for using LLMs to write bigger pieces of software. This documentation needs to be comprehensive and precise enough to guide the LLM in generating the formal artifacts (usually code) needed for implementation.

In other words, we're talking about high-quality specifications and documentation - something that has always been crucial but often neglected in traditional development workflows.

A New Development Paradigm

One of the hardest adjustments for me and others my team has been accepting that writing clear documentation, creating unit tests, and spending time thinking about architecture and goals trumps the "I need to hack this for tomorrow ASAP" mentality. Just ten minutes spent reviewing a log file could be the time it takes an LLM to write the entire solution.

This documentation needs stellar: concise, unambiguous, exhaustive, filled with examples - you can't assume reasoning on the part of the reader or knowledge about other parts of the system. The result is coincidentally also perfect for quickly onboarding new developers. Furthermore, these specs and documentation must stay current. When they diverge from the actual code, you're quickly in for a world of hurt.

The easiest way to validate and correct LLM output is to have extensive linting with readable error messages. Any tooling, validation, code generation, DSLs, solid APIs, or unit tests not only improves the quality and effectiveness of LLM output and reduces token count, but it also makes it harder to write the wrong thing. These advantages apply equally to traditional human workflows.

The most surprising insight? Doing things the right way enables the 10x velocity that LLMs offer. Proper preparation and structure is an order of magnitude faster than traditional corner-cutting. All of my heuristics regarding deadlines have gone out the window, and I'm having to relearn them in light of continuously improving tools and models.

A Personal Case Study

Let me share a concrete example. I recently developed a Bleve Elasticsearch clone with embeddings support. The entire project, including documentation and learning the Bleve API and Ollama embeddings, took just one hour. One fricking hour!

For another example, I created a complex Flipper Zero app in about three 5-hour blocks, including porting the UI library to C++.

In both cases, I generated intermediate documentation that IMO trumps the original project's documentation by far. You can see examples of this approach in my repositories:

For a bigger, messier codebase, my opensource ecosystem is approaching 350kLOC of code, with a mix of incredibly abstract frameworks, maximalist tooling and sprawling one-offs: GO GO GOLEMS ECOSYSTEM.

Learning Curve and Prerequisites

Achieving this level of productivity isn't automatic. It requires:

Solid development fundamentals
Practice with LLM-assisted development
Proficiency with modern development tools
Access to capable language models (like Claude or GPT-4)
Experience in software architecture

However, it's not rocket science. I consider myself a decent architect but not a super great programmer. The key is learning to leverage these tools effectively.

Implications for Open Source and Personal Projects

This paradigm shift has profound implications for open source development. Projects often lack comprehensive documentation, tooling, and frontends because these elements are time-consuming to create and maintain. LLM-assisted development offers a path to higher-quality open source software with better documentation and tooling.

For personal projects, the impact is equally significant. I can now develop complex software while being less intensely focused - even while watching Netflix. This makes it feasible to create sophisticated applications for friends and family, or to develop educational software, in time frames that would have been impossible before.

What's even more exciting is seeing how many non-software professionals in software-adjacent fields (genomics, data science, library sciences) now feel empowered to start their own businesses because they can write "real" software without much assistance. This democratization of software development is opening new doors for domain experts to create specialized tools and solutions that might never have existed otherwise.

Looking Forward

This isn't empty technological hype - it's a fundamental transformation in how we can approach software development. The ability to rapidly produce well-documented, maintainable code changes what's possible for both personal and professional projects.

Dismissing LLM-assisted development as mere "probabilistic boilerplate" misses the bigger picture. When properly leveraged, these tools enable us to build better software faster, with higher quality documentation and more comprehensive testing than many of us could practically achieve before.

The challenge now is adapting our development practices and mental models to fully utilize these capabilities. Our traditional heuristics about development time, effort, and best practices need to be reconsidered in light of these continuously improving tools and models.

PS

I am pretty proud that these aged quite well: