Untapped Way to Learn a Codebase: Build a Visualizer

Untapped Way to Learn a Codebase: Build a Visualizer

237

by andreabergia

socketcluster

The problem with most people's code is that it's full of unnecessary complexity and creates a ton of work. I swear at least 90% of projects from 'top' companies, by 'top' engineers is full of unnecessary complexity which slows everything down significantly. They literally need a team of 20+ engineers to do the work which could have been done more effectively with 1 good engineer.

Based on modern metrics for code quality, almost nobody will realize that they're looking at bad code. I've seen a lot of horrible codebases which looked pretty good superficially; good linting, consistent naming, functional programming, static typing, etc... But architecturally, it's just shockingly bad; it's designed such that you need to refactor the code constantly; there is no clear business layer; business logic traverses all components including all the supposedly generic ones.

With bad code, any business requirement change requires a deep refactoring... And people will be like "so glad we use TypeScript so that I don't accidentally forget to update a reference across 20 different files required as part of this refactoring" - Newsflash: Your tiny business requirement change requires you to update 20+ files because your code sucks! Sure TypeScript helps in this case, but type safety should be the least of your concerns. If code is well architected, complex abstractions don't generally end up stretching across more than one or two files.

There's a reason we say "Leaky abstraction" - If a complex abstraction leaks through many file boundaries; it's an abstraction and it's leaky!

lpedrosa

I fully agree with your sentiment, and it also drives me crazy sometimes.

I wonder if the main problem was all the min maxing interview patterns that rewarded algorithm problem solvers back in the 2010's onwards.

People applied for software engineering jobs because they wanted to play with tech, not because they wanted to solve product problems (which should have a direct correlation with revenue impact)

Then you have the ego boosting blog post era, where everyone wanted to explain how they used Kafka and DDD and functional programming to solve a problem. If you start reading some of those posts, you'll understand that the actual underlying problem was actually not well understood (especially the big picture).

This led the developer down a wild goose chase (willingly), where they end up spending tons of time burning through engineering time, which arguably could be better spent in understanding the domain.

This is not the case for everyone, but the examples are few.

It makes me wonder if the incentives are misaligned, and engineering contributing to revenue ends up not translating to hard cash, promos and bonuses.

In this new AI era, you can see the craftsman style devs going full luddite mode, IMO due to what I've mentioned above. As a craftsman style dev myself. I can only set up the same async job queue pattern that many times. I'm actually enjoying the rubber ducking with the AI more and more. Mostly for digging into the domain and potential approaches for simplification (or even product refinement).

judahmeek

> If code is well architected, complex abstractions don't generally end up stretching across more than one or two files.

This is a naive metric since it's satisfied by putting the entire code base into a single file.

Part of the reason that business requirement changes to modern web dev code bases require changes to so many files is because web devs are encouraged to restrict the scope of any one file as much as possible.

I can't tell if you're complaining about that specifically or if you think it's possible to have both a highly modularized code base & still restrict business requirement changes to only a couple files.

If the latter, then I'd love to know guidelines for doing so.

wackget

23h

You just described literally all modern web development.

socketcluster

17h

Almost all, yes.

I said 90% in my comment but that's from my professional experience which is probably biased towards complex projects where maintainability is more important.

tclancy

This is an interesting approach. I think, in a way, it mirrors what I do. Having contracted for much of my career, I’ve had to get up to speed on a number of codebases quickly. When I have a choice of how to do this, I find a recently closed issue and try to write a unit test for it. If nothing else, you learn where the tests live, assuming they exist, and how much of a safety net you have if you start hacking away at things. Once I know how to add tests and run them (which is a really good way to deal with the codebase setup problem mentioned in the article because a lot of onboarding docs only get you to the codebase running without all the plumbing you need), I feel like I can get by without a full understanding of the code as I can throw in a couple of tests to prove what I want to get to and then hope the tests or CI or hooks prevent me from doing A Bad Thing. Not perfect and it varies depending on on how well the project is built and maintained, but if I can break things easily, people are probably used to things breaking and then I have an avenue to my first meaningful contribution. Making things break less.

its-kostya

I am quite skeptical and reserved when it comes to AI, particularly as it relates to impacts of the next generation of engineers. But using AI to learn a code base has been life-changing. Using a crutch to feel your way around. Then ditching the crutch when things are familiar, like using a map until you learn the road yourself.

patrickdavey

19h

I'm about to start a new role. What have you found most effective in using it to learn a new code base? Just asking questions about "what is this class doing" ? drawing architecture diagrams?

its-kostya

58m

Just ask it what naturally draws your curiosity and use it to build your mental model. I may add that our company got us enterprise subscription (so models aren't trained on our IP) so I can just point it at the entire codebase, rather than copying/pasting snippets into a chat window.

What does this program accomplish? How does it accomplish it? Walk me through the boot sequence. Where does it do ABC?

I work in a company where I frequently interact with adjacent teams' code bases. When working on a ticket that touches another system, I'll typically tell it what I'm working on and ask it to point me to areas in the code that are responsible for that capability and which tests exercise that code. This is a great head start for me. I then start "in the ball park".

I would not recommend to have it make diagrams for you. I don't know what it is but they LLMs just aren't great at coveting information into diagram form. I've had it explain, quite impressively, parts of code and when I ask it to turn that into a diagram it comes up short. Must be low on training data expressing itself in that medium. It's an okay way to get the syntax for a diagram started, however.

I wish you an auspicious time in your new role!

catapart

Your visualizer looks great! I really like that it queues up tasks to run instead of only operating on the code during runtime attachment. I haven't seen that kind of thing before.

I built my own node graph utility to do this for my code, after using Unreal's blueprints for the first time. Once it clicked for me that the two are different views of the same codebase, I was in love. It's so much easier for me to reason about node graphs, and so much easier for me to write code as plain text (with an IDE/language server). I share your wish that there were a more general utility for it, so I could use it for languages other than js/ts.

Anyway, great job on this!

criddell

Is this similar to what you can get with Doxygen?

https://en.wikipedia.org/wiki/Doxygen#/media/File:Doxygen-1....

vasvir

17h

That would be my question too...

hyperific

GitHub Next comes to mind

https://githubnext.com/projects/repo-visualization/

esafak

Not very useful, is it?

xtiansimon

12h

> "...how I learn an unfamiliar codebase"

There should be more writing and discussion in this area for several reasons. Simplest reason because we're curious about how others do this. But also because it's an interesting topic, IMHO, because layers of abstraction--code designed to run other code--can be difficult to talk about, because the referents get messy. How do you rhetorically step through layers of abstraction?

Charon77

In reverse engineering we often use Graph View to see execution flow as well. Glad to see it being used elsewhere

touristtam

Do you automate that? If so what tooling do you use?

Pay08

IDA does it by default, for example.

Quiark

Do you guys remember the smalltalk toolkit posted here a while ago which their creators made specifically for help understanding new codebases?

xkriva11

https://gtoolkit.com/ or https://moosetechnology.org/

bokchoi

Woah, that Glamorous Toolkit environment looks amazing. Thanks for the pointer.

FailMore

The building of the visualiser was less interesting to me than the result and your conclusion. I agree that finding new ways to ingest the structure and logic of software would be very useful, and I like your solution. Is there a way to test it out?

hks0

I always thought to do this visualization in 3d and maybe with VR. Not sure how useful or pleasing experience it would be. Kudos to the author of the project to get this done!

avaer

I got Minority Report vibes.

This kind of approach might be what (finally) unlocks visual programming?

I feel like most good programmers are like good chess players. They don't need to see the board (code). But for inputting the code transformation into the system this might be a good programmer's chessboard.

Though to make it work concretely for arbitrary codebases I feel like a coding agent behind the scenes is 100% required.

netsharc

A 3d environment (VR-headset with Tom Cruise-style-swiping, or Doom-style with WASD navigation) would be cool, one could be "in orbit", observing the system, watching the nodes and their interactions, and pause and see what messages they're passing to each other. How about time-travel-debugging to allow rewinds too!

As a bonus, porting Doom to it should be "trivial".

mathgeek

> I feel like most good programmers are like good chess players.

A specific type or area of developers, I'd say. There are many types and not all of them require understanding sizeable code bases to do their work well.

soulofmischief

Understanding your large codebase is a few prompts away. You can ask a model to trace through and provide reports on the project's design, architectural and implementation. From there, you can drill in with followups.

Done right, you may not know specific lines or chunks of code by heart, but much like a tuned-in company CEO, you have eyes and ears on the ground and retain global oversight and insight of the project itself. For specifics, you can learn what you need as you need it. If that means knowing how every single module works, that's just a conversation with your agent.

satheeshds

- But I'll admit, this isn't precisely how I would do it today

How would you do it today?

jimmyhmiller

I try to explain what I mean the next few sentences of the post. I have spent a good amount of my career jumping into fairly large code bases. I don't need to take it quite so step by step. I have seen enough code to take shortcuts, to guess at what is there.

But telling people that isn't helpful. I try at the beginning to give more step by step of how I would get into understand the code base if I didn't already know these kinds of shortcuts. (I'm not sure I could write those down, they are just know how and heuristics, like how when you are a starting to code a missing ; can take a much longer time to see than as you've been programming for a while)

onionisafruit

I thought that was curious. He says this isn’t how he would do it today then goes on to do it today (or presumably the same day he wrote that he wouldn’t do it this way today).

cyberpunk

Doesn’t anyone use debuggers anymore?

When I have a codebase I dont know or didn’t touch in some time and there’s a bug, first step is reproduce it an then set a breakpoint early on somewhere, crab some coffee and spend some time to step through it looking at state until I know what’s happening and from there its usually kind of obvious.

Why would one need a graph view to learn a codebase when you can just slap a red dot next to the route and step a few times?

justinhj

I have found that interactive visualizations are a great way to understand code and systems in general. Now you can have an AI make one in under a minute it's a very useful tool.

https://heyes-jones.com/externalsort/treeofwinners.html

Take this example. I can step through the algorithm, view the data structure and see a narration of each step.

A debugger is useful for debugging edge cases but it is very difficult to learn a complex system by stepping through it.

glaslong

Very cool! For all its faults, seeing control and value change flows through execution is one of the things I really liked about Unreal's Blueprint viz scripting system. This looks like a better take on that.

And for huge git repos I always like to generate a Gource animation to understand how the repo grew, when big rearrangements and refactors happened, what the most active parts of the codebase are, etc.

Comment was deleted :(

luxurytent

This may be where AI coding tools unlock us. Being able to build tooling against novel concepts that change how we approach reading and writing code. I like it!

indiestack

The unit test approach from the contractor in the thread is gold: "find a recently closed issue and try to write a unit test for it." This forces you to understand the test infrastructure, the module boundaries, and the actual behavior — not just the code structure.

I'd add one more technique that's worked well for me: trace a single request from HTTP endpoint to database and back. In a FastAPI app, that means starting at the route handler, following the dependency injection chain, seeing how the ORM/query layer works, and understanding the response serialization. You touch every layer of the stack by following one real path instead of trying to understand the whole codebase at once.

Visualizers are nice for the "big picture" but they rarely help you understand why the code works the way it does. The why is in the git history and the closed issues, not in a dependency graph.

Comment was deleted :(

esafak

A use case that interests me is dynamic visualization for debugging, when there are interacting systems.

esafak

To flesh this out, let me see the volume of calls and data from one place to another. Help diagnose back-pressure, drops, rejections, and any other irregularities.

Think of an on-caller who wants to quickly pinpoint a problem. Visualization could help one understand the nature of the problem before reading the code. Then you could select a part of the visualization and ask the computer to tell you what that part does, if there are any recent changes to it, etc.

Comment was deleted :(

jnpnj

This is the first thing that I used LLMs on. Not code generation, but parser and tooling to gain understanding. Also saves resources in the long run.

lysace

One of my favorite uses for Claude Code is to point it at a section of seriously badly written code with undecipherable symbol names, over the top cyclomatic complexity etc and just ask it to make the code readable.

hxugufjfjf

Cool project! Would you be willing to share the source code?

cyrusradfar

Warning Blatant Self Promotion

I created Intraview for VS Code, Cursor, etc., that makes it easy to create code tours with your Coding agent by simply saying, "Create a tour that helps me understand how to get started with this repository."

It has other features, but it was designed for the problem of getting in new code bases and it allows the tours to be saved in the repo as flat json files. You can re-open or share tours with new folks, and if the code changes the system notifies you how to ask your agent to update the tour.

Just a thought.

TonyStr

You are so lucky to have git history and issues to work from!

gowld

Where's the visualizer the blog post talks about?

How is it different from regular code browser/indexers?

indiekitai

[dead]

artzev_

[dead]

nimbus-hn-test

[dead]

nimbus-hn-test

[dead]

Crafted by Rajat

Source Code

hckrnws

Untapped Way to Learn a Codebase: Build a Visualizer