My Workflow: Unfamiliar C and C++ codebases
Improving and automating my workflow is something I have put considerable investment into, and I would like to share what I can. This is part of a series of posts about my development workflow. You can read the other installments here:
I have a curse: I know how to program and I run Linux, so I naturally tend to fix things when I see them, which results in significant yak shaving, but also fixes things permanently. I also know that one of the most effective ways to learn things about libraries or programs that are misbehaving is reading their source code.
Acquiring source code
Actually getting the source to things is surprisingly annoying: every project has its own Web site where they might link their source, and I would have to find those. I don't want to navigate websites as they're distracting and too often don't have the links to the code anywhere easy to find.
Just grab it off GitHub
Because of the monoculture in open source, very few projects don't at least have a mirror of their source on GitHub. This makes it very convenient to acquire source for things, as I only have to look in one place that's also quite uniform and machine accessible.
The gh
GitHub command line tool is probably one of the better ways to
interact with the Web site without distractions. Unfortunately, there is an old
feature request for adding search,
which has not been implemented yet. Thankfully, you can add aliases. I've done
so with a fancy alias for searching repos which emits nice coloured output for
finding where the name of the repo for the thing I want.
aliases:
search: 'api -X GET search/repositories -f q=''$1'' --template ''{{range .items}}{{ .full_name | color "white" }}: {{ .description }}{{"\n"}}{{ end }}'''
I can then do something like gh search 'github cli'
, for example, and it will
print out the name and description of cli/cli
which is the one I want. It can
then be cloned with gh repo clone cli/cli
.
Can't find it on GitHub
At this point I would either Google it on DuckDuckGo, or look at where the
distro package got it from (this is important in the case of things that have
been forked or have multiple versions). I use Arch, so that would entail either
looking at the package info with pacman -Si PACKAGE-NAME
or grabbing the
package source with asp checkout PACKAGE-NAME
, then reading the PKGBUILD
.
Equivalent things exist for other distros, for example, reading nixpkgs
source for NixOS.
Dealing with C or C++
C or C++ projects often have a lot of latitude to do creative things with their
build processes, and I want an IDE to work on them. I use nvim
and
clangd
for my IDE, so working on codebases with arbitrary build systems is
a question of generating a compilation database (compile_commands.json
).
Using various build systems
These are the build systems I have dealt with the most while working on random C or C++ projects. Sometimes they don't document how to use the build system, or I don't want to read the README.
GNU autotools
Identifier: .in
files or configure
script at the root of the repo.
Notes: If configure
is missing, there may be bootstrap
or
bootstrap.sh
that will generate one, or you may have to run
autoreconf --install
if that's not there.
Usage: ./configure
, then make -jN
where N is the number of build jobs.
./configure
may need some options, it has a --help
option that will list
the possible ones.
CMake
Identifier: CMakeLists.txt
at the root of the repo.
Usage: cmake -G Ninja -B build
, then ninja -C build
.
Meson
Identifier: meson.build
at the root of the repo.
Usage: meson ./build
, then ninja -C build
.
Compilation databases
Unusual and obsolete build systems such as GNU autotools/GNU make
Use Bear: configure, then bear -- make [make options]
. This will do
LD_PRELOAD
magic and intercept the calls to the compiler and save them. This
tool works on basically any build system, even silly shell scripts. Just
remember to run it on a clean build or else it will miss some files!
Sometimes compiledb
works better: compiledb make -- [make options]
.
Linux kernel
Pretty high up on the list of unusual build systems. Build the kernel with
clang: make CC=clang defconfig
then make CC=clang -jN
, then run
scripts/clang-tools/gen_compile_commands.py
.
CMake
CMake is nice because it can generate Ninja. You can invoke it with -G Ninja
,
build, then ask Ninja for compile commands with
ninja -C build -t compdb > compile_commands.json
.
Meson
Like CMake, after building the software, you can use Ninja to get a compilation
database with ninja -C build -t compdb > compile_commands.json
.
My IDE got confused because they're doing cursed stuff
This has happened a couple of times, especially when reading source code to
glibc
, for instance, where there are definitions in headers and definitions
in unrelated .c
files, among other things. Fortunately, ctags
is not smart
enough to get confused by cursed stuff and works fine in parallel with a LSP
server. Run ctags -R .
at the root of the repo and use nvim
to navigate
with the tags:
- CTRL-] jump to the identifier under the cursor.
- CTRL-W g CTRL-] jump to the identifier under the cursor in a new split.
:tj TAG_NAME
selects from the tags calledTAG_NAME
or jumps there directly if there's only one. Useful if there are multiple definitions of the same identifier.- CTRL-O goes back in history to the last jumped position.
- CTRL-I goes forward in history to the last jumped position.
Finding things
I use ripgrep
as it has good defaults and is extremely fast. Usually the
way I find things is I look for a unique word related to the thing I want in
the documentation, for instance, a long command line option or an error
message, then I search it case insensitively (-i
) and start browsing code
from there.
Sometimes it's not that easy, and I have to use some more tricks as I can't
find it by searching. I often pull out a debugger after strace
ing the program
to try to find an interesting system call I can set a breakpoint on to track
down the code path. Or, for instance, I know that a program opens two dialogs
before the interesting behaviour, so I set a breakpoint on XCreateWindow
. I
then take a backtrace and have somewhere to start looking in the codebase. Be
creative!
Usually my debugger of choice is either rr
or gdb
.
This is part of a series of posts about my development workflow. You can read the other installments here: