Code spelunking: How to dive into unfamiliar code (part 1)
Feb 9th, 2024
Share on TwitterShare on LinkedInShare on RedditShare on FacebookPart of working as a software engineer, and specifically as a contractor like myself, is the reality of investigating code you didn't write. Be it you need to understand code a teammate wrote, are onboarding to a new project, or want to learn how a dependency works, you'll need to master maneuvering in unfamiliar territory. Let's discuss a few tricks you can use to help you.
Categorize the project
First, understand the kind of project you're investigating. JavaScript frameworks are structured differently than personal websites or android apps. Knowing this gives you a rough lay of the land. It provides coarse understanding like how the build system may be set up, or where the "main" method is if there is one.
Let's take Deno as an example. Looking at its website, we can tell it is a JavaScript runtime written in Rust. What does this tell us? You can start to make sense of it as a Rust project built and managed with Cargo, the Rust package manager.
We see evidence of that if you look at the names of the root files. You don't even need to know what they're for or be able to read Rust, but this shows us we should think of it as a Cargo/Rust project. This means simple Google searches like "Where are Rust dependencies declared?" or "Rust tutorials" are likely to help us answer questions we may have.
Find the main method
Once you find the main method of a program, you can follow function calls and if statements until you get what you need. To run a Deno program, you'd write something like this in your terminal.
deno run main.ts
If our goal is to find the first line of code in Deno that runs main.ts
, we should look for CLI (Command Line Interface) code.
Looking again at the root level files, look what we find! πππ
We find a folder named cli
! Let's look in there. Behold! A main.rs
file!
There we have it, pub fn main().
pub fn main() {
setup_panic_hook();
util::unix::raise_fd_limit();
util::windows::ensure_stdio_open();
#[cfg(windows)]
colors::enable_ansi(); // For Windows 10
deno_runtime::permissions::set_prompt_callbacks(
Box::new(util::draw_thread::DrawThread::hide),
Box::new(util::draw_thread::DrawThread::show),
);
let args: Vec<String> = env::args().collect();
// .... and so on
}
Trust the naming
It's tempting to think that unfamiliar projects are written with cryptic, domain specific naming schemes you have no hope of understanding. This is true for some projects and unfortunately more common when those projects are behind corporate walls built by dysfunctional engineering teams. That being said, it has to have been understandable by someone.
After finding the main method, to find what you're looking for, trust the naming until you have reason not to. For open source projects this almost always pays off as they have incentive to make it easy to understand, else contributions won't be made.
From here, its just a matter of reading the code to find the path it takes to execute deno run main.ts
. There's a run_subcommand(flags).await call that seems helpful. Turns out it is defined in the same file and there's a pattern matching case that looks like this.
let handle = match flags.subcommand.clone() {
// ..... other options .....
DenoSubcommand::Run(run_flags) => spawn_subcommand(async move {
if run_flags.is_stdin() {
tools::run::run_from_stdin(flags).await
} else {
tools::run::run_script(flags, run_flags).await
}
}),
// ..... other options .....
}
Following program execution further we can dive deeper.
- Let's assume we don't know what
run_flags.is_stdin()
means, we can follow both branches to get to deno/cli/tools/run/mod.rs and turns out both of those functions are defined here and do similar things. - They then seem to do a bunch of setup then make a call to let exit_code = worker.run().await?; which sounds a lot like what we're looking for.
- Seeing the worker is created a few lines above with a call to
create_main_worker(main_module, permissions)
a search of that function name turns up with one definition. - It internally calls create_custom_worker just below it.
- It returns a type of CliMainWorker which after a search is defined here.
- It has a run method defined for it which is probably what we are looking for.
- The body of the code seems to run the event loop until an error is received or whatever this is.
- Finally, it returns the exit code.
With a basic understanding of the execution flow, but let's see how right it is.
Set a breakpoint
Reading the code is a great way to get started, but nothing tops running the program to understand how it actually works.
Starting with the Contributing page of Deno, we can learn how to clone, build, and run the program. Unfortunately it doesn't mention how to debug a binary, but that's why we categorized it in the beginning!!! It's just a Rust program!
I used VSCode's debugging Rust guide to figure out how to set a breakpoint and eventually got the following for .vscode/launch.json
, the file responsible for debugging configurations.
{
"version": "0.2.0",
"configurations": [
{
"type": "lldb",
"request": "launch",
"name": "Debug deno",
"cargo": {
"args": ["build", "--package", "deno", "--bin", "deno"]
},
"args": ["run", "main.ts"],
"cwd": "${workspaceFolder}"
}
]
}
I the only other setup I did was create a main.ts
file at the root of the repo with a simple console.log("Hello, World!")
statement in it.
With everything set up I set a breakpoint and ..... drumroll π₯ ..... found one other key line.
pub async fn run(&mut self) -> Result<i32, AnyError> {
let mut maybe_coverage_collector =
self.maybe_setup_coverage_collector().await?;
let mut maybe_hmr_runner = self.maybe_setup_hmr_runner().await?;
log::debug!("main_module {}", self.main_module);
if self.is_main_cjs {
deno_node::load_cjs_module(
&mut self.worker.js_runtime,
&self.main_module.to_file_path().unwrap().to_string_lossy(),
true,
self.shared.options.inspect_brk,
)?;
} else {
self.execute_main_module_possibly_with_npm().await?; // ππΌ this one
}
// ... and so on
}
This line was the one actually responsible for executing the code within main.ts
. If you follow the flow of execution it eventually hands off execution to the deno_core crate to evaluate the module.
/// Executes specified JavaScript module.
pub async fn evaluate_module(
&mut self,
id: ModuleId,
) -> Result<(), AnyError> {
self.wait_for_inspector_session();
let mut receiver = self.js_runtime.mod_evaluate(id); // ππΌ Right here
tokio::select! {
// Not using biased mode leads to non-determinism for relatively simple
// programs.
biased;
maybe_result = &mut receiver => {
debug!("received module evaluate {:#?}", maybe_result);
maybe_result
}
event_loop_result = self.run_event_loop(false) => {
event_loop_result?;
receiver.await
}
}
}
Looking back at my original trace of the code, the code I found in point #7 above seems to handle clean up of some sort.
I could continue, but I think I've made my point.
Conclusion
Diving into unknown code isn't as scary as others might make it out to be. I've shown that simple techniques can help you tremendously.
I demonstrated a simple example in this post, but it also can be used for answering harder questions. For example, I was able to use this to find where the event loop is ticked in Deno. More specifically, it probably happens on this line.
These techniques work best for open source projects or projects that are otherwise somewhat intelligible. For corporate projects that don't have any incentives to make code understandable, you'll have to resort to other techniques, but more on that in a different post π.
Next post, I'll go over even more techniques, like one that helped me make storybook-addon-next, the precursor of @storybook/nextjs π.
Did you enjoy the post? Consider supporting me and my tea addition π€π΅.
Or sharing with othersShare on TwitterShare on LinkedInShare on RedditShare on Facebook