Nick Desaulniers

The enemy's gate is down

Finding Compiler Bugs With C-Reduce

| Comments

Support for a long awaited GNU C extension, asm goto, is in the midst of landing in Clang and LLVM. We want to make sure that we release a high quality implementation, so it’s important to test the new patches on real code and not just small test cases. When we hit compiler bugs in large source files, it can be tricky to find exactly what part of potentially large translation units are problematic. In this post, we’ll take a look at using C-Reduce, a multithreaded code bisection utility for C/C++, to help narrow done a reproducer for a real compiler bug (potentially; in a patch that was posted, and will be fixed before it can ship in production) from a real code base (the Linux kernel). It’s mostly a post to myself in the future, so that I can remind myself how to run C-reduce on the Linux kernel again, since this is now the third real compiler bug it’s helped me track down.

So the bug I’m focusing on when trying to compile the Linux kernel with Clang is a linkage error, all the way at the end of the build.

1
drivers/spi/spidev.o:(__jump_table+0x74): undefined reference to `.Ltmp4'

Hmm…looks like the object file (drivers/spi/spidev.o), has a section (__jump_table), that references a non-existent symbol (.Ltmp), which looks like a temporary label that should have been cleaned up by the compiler. Maybe it was accidentally left behind by an optimization pass?

To run C-reduce, we need a shell script that returns 0 when it should keep reducing, and an input file. For an input file, it’s just way simpler to preprocess it; this helps cut down on the compiler flags that typically requires paths (-I, -L).

Preprocess

First, let’s preprocess the source. For the kernel, if the file compiles correctly, the kernel’s KBuild build process will create a file named in the form path/to/.file.o.cmd, in our case drivers/spi/.spidev.o.cmd. (If the file doesn’t compile, then I’ve had success hooking make path/to/file.o with bear then getting the compile_commands.json for the file.) I find it easiest to copy this file to a new shell script, then strip out everything but the first line. I then replace the -c -o <output>.o with -E. chmod +x that new shell script, then run it (outputting to stdout) to eyeball that it looks preprocessed, then redirect the output to a .i file. Now that we have our preprocessed input, let’s create the C-reduce shell script.

Reproducer

I find it helpful to have a shell script in the form:

  1. remove previous object files
  2. rebuild object files
  3. disassemble object files and pipe to grep

For you, it might be some different steps. As the docs show, you just need the shell script to return 0 when it should keep reducing. From our previous shell script that pre-processed the source and dumped a .i file, let’s change it back to stop before linking rather that preprocessing (s/-E/-c/), and change the input to our new .i file. Finally, let’s add the test for what we want. Since I want C-Reduce to keep reducing until the disassmbled object file no longer references anything Ltmp related, I write:

1
$ objdump -Dr -j __jump_table spidev.o | grep Ltmp > /dev/null

Now I can run the reproducer to check that it at least returns 0, which C-Reduce needs to get started:

1
2
3
$ ./spidev_asm_goto.sh
$ echo $?
0

Running C-Reduce

Now that we have a reproducer script and input file, let’s run C-Reduce.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$ time creduce --n 40 spidev_asm_goto.sh spidev.i
===< 144926 >===
running 40 interestingness tests in parallel
===< pass_includes :: 0 >===
===< pass_unifdef :: 0 >===
===< pass_comments :: 0 >===
===< pass_blank :: 0 >===
(0.7 %, 2393679 bytes)
(5.3 %, 2282207 bytes)
===< pass_clang_binsrch :: replace-function-def-with-decl >===
(12.6 %, 2107372 bytes)
...
===< pass_indent :: final >===
(100.0 %, 156 bytes)
===================== done ====================

pass statistics:
  method pass_clang_binsrch :: remove-unused-function worked 1 times and failed 0 times
...
  method pass_lines :: 0 worked 427 times and failed 998 times
            ******** /android0/kernel-all/spidev.i ********

a() {
  int b;
  c();
  if (c < 2)
    b = d();
  else {
    asm goto("1:.long b - ., %l[l_yes] - . \n\t" : : : : l_yes);
  l_yes:;
  }
  if (b)
    e();
}
creduce --n 40 spidev_asm_goto.sh spidev.i  1892.35s user 1186.10s system 817% cpu 6:16.76 total
$ wc -l spidev.i.orig
56160 spidev.i.orig
$ wc -l spidev.i
12 spidev.i

So it took C-reduce just over 6 minutes to turn >56k lines of mostly irrelevant code into 12 when running 40 threads on my 48 core workstation.

It’s also highly entertaining to watch C-Reduce work its magic. In another terminal, I highly recommend running watch -n1 cat <input_file_to_creduce.i> to see it pared down before your eyes.

Jump to 4:24 to see where things really pick up. asciicast asciicast

Finally, we still want to bisect our compiler flags (the kernel uses a lot). I still do this process manually, and it’s not too bad. Having proper and minimal steps to reproduce compiler bugs is critical.

That’s enough for a great bug report for now. In a future episode, we’ll see how to start pulling apart llvm to see where compilation is going amiss.

Booting a Custom Linux Kernel in QEMU and Debugging It With GDB

| Comments

Typically, when we modify a program, we’d like to run it to verify our changes. Before booting a compiled Linux kernel image on actual hardware, it can save us time and potential headache to do a quick boot in a virtual machine like QEMU as a sanity check. If your kernel boots in QEMU, it’s not a guarantee it will boot on metal, but it is a quick assurance that the kernel image is not completely busted. Since I finally got this working, I figured I’d post the built up command line arguments (and error messages observed) for future travelers. Also, QEMU has more flags than virtually any other binary I’ve ever seen (other than a google3 binary; shots fired), and simply getting it to print to the terminal is ¾ the battle. If I don’t write it down now, or lose my shell history, I’ll probably forget how to do this.

TL;DR:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# one time setup
$ mkinitramfs -o ramdisk.img
$ echo "add-auto-load-safe-path path/to/linux/scripts/gdb/vmlinux-gdb.py" >> ~/.gdbinit

# one time kernel setup
$ cd linux
$ ./scripts/config -e DEBUG_INFO -e GDB_SCRIPTS
$ <make kernel image>

# each debug session run
$ qemu-system-x86_64 \
  -kernel arch/x86_64/boot/bzImage \
  -nographic \
  -append "console=ttyS0 nokaslr" \
  -initrd ramdisk.img \
  -m 512 \
  --enable-kvm \
  -cpu host \
  -s -S &
$ gdb vmlinux
(gdb) target remote :1234
(gdb) hbreak start_kernel
(gdb) c
(gdb) lx-dmesg

Booting in QEMU

We’ll play stupid and see what errors we hit, and how to fix them. First, let’s try just our new kernel:

1
$ qemu-system-x86_64 -kernel arch/x86_64/boot/bzImage

A new window should open, and we should observe some dmesg output, a panic, and your fans might spin up. I find this window relatively hard to see, so let’s get the output (and input) to a terminal:

1
$ qemu-system-x86_64 -kernel arch/x86_64/boot/bzImage -nographic

This is missing an important flag, but it’s important to see what happens when we forget it. It will seem that there’s no output, and QEMU isn’t responding to ctrl+c. And my fans are spinning again. Try ctrl+a, then c, to get a (qemu) prompt. A simple q will exit.

Next, We’re going to pass a kernel command line argument. The kernel accepts command line args just like userspace binaries, though usually the bootloader sets these up.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ qemu-system-x86_64 -kernel arch/x86_64/boot/bzImage -nographic -append "console=ttyS0"
...
[    1.144348] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    1.144759] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc6 #10
[    1.144949] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[    1.145269] Call Trace:
[    1.146027]  dump_stack+0x5c/0x7b
[    1.146162]  panic+0xe4/0x252
[    1.146286]  mount_block_root+0x1f1/0x2d6
[    1.146445]  prepare_namespace+0x13b/0x171
[    1.146579]  kernel_init_freeable+0x227/0x254
[    1.146721]  ? rest_init+0xb0/0xb0
[    1.146826]  kernel_init+0xa/0x110
[    1.146931]  ret_from_fork+0x35/0x40
[    1.147412] Kernel Offset: 0x1c200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    1.147901] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
(qemu) q

Well at least we’re no longer in the dark (remember, ctrl+a, c, q to exit). Now we’re panicking because there’s no root filesystem, so there’s no init binary to run. Now we could create a custom filesystem image with the bare bones (definitely a post for another day), but creating a ramdisk is the most straightforward way, IMO. Ok, let’s create the ramdisk, then add it to QEMU’s parameters:

1
2
$ mkinitramfs -o ramdisk.img
$ qemu-system-x86_64 -kernel arch/x86_64/boot/bzImage -nographic -append "console=ttyS0" -initrd ramdisk.img

Unfortunately, we’ll (likely) hit the same panic and the panic doesn’t provide enough information, but the default maximum memory QEMU will use is too limited. -m 512 will give our virtual machine enough memory to boot and get a busybox based shell prompt:

1
2
3
4
$ qemu-system-x86_64 -kernel arch/x86_64/boot/bzImage -nographic -append "console=ttyS0" -initrd ramdisk.img -m 512
...
(initramfs) cat /proc/version
Linux version 4.19.0-rc7+ (nick@nick-Blade-Stealth) (clang version 8.0.0 (https://git.llvm.org/git/clang.git/ 60c8c0cc0786c7f6b8dc5c1e3acd7ec98f0a7b6d) (https://git.llvm.org/git/llvm.git/ 64c3a57bec9dbe3762fa1d80055ba850d6658f5b)) #18 SMP Wed Oct 24 19:29:53 PDT 2018

Enabling kvm seems to help with those fans spinning up:

1
$ qemu-system-x86_64 -kernel arch/x86_64/boot/bzImage -nographic -append "console=ttyS0" -initrd ramdisk.img -m 512 --enable-kvm

Finally, we might be seeing a warning when we start QEMU:

1
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]

Just need to add -cpu host to our invocation of QEMU. It can be helpful when debugging to disable KASLR via nokaslr in the appended kernel command line parameters, or via CONFIG_RANDOMIZE_BASE not being set in our kernel configs.

We can add -s to start a gdbserver on port 1234, and -S to pause the kernel until we continue in gdb.

Attaching GDB to QEMU

Now that we can boot this kernel image in QEMU, let’s attach gdb to it.

1
$ gdb vmlinux

If you see this on your first run:

1
2
3
4
warning: File "/home/nick/linux/scripts/gdb/vmlinux-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
  add-auto-load-safe-path /path/to/linux/scripts/gdb/vmlinux-gdb.py
line to your configuration file "/home/<username>/.gdbinit".

Then you can do this one time fix in order to load the gdb scripts each run:

1
2
$ cd linux
$ echo "add-auto-load-safe-path `pwd`/scripts/gdb/vmlinux-gdb.py" >> ~/.gdbinit

Now that QEMU is listening on port 1234 (via -s), let’s connect to it, and set a break point early on in the kernel’s initialization. Note the the use of hbreak (I lost a lot of time just using b start_kernel, only for the kernel to continue booting past that function).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(gdb) target remote :1234
Remote debugging using :1234
0x000000000000fff0 in cpu_hw_events ()

(gdb) hbreak start_kernel
Hardware assisted breakpoint 2 at 0xffffffff82904a1d: file init/main.c, line 536.
(gdb) c
Continuing.

Breakpoint 2, start_kernel () at init/main.c:536
536       set_task_stack_end_magic(&init_task);
(gdb) bt
#0  start_kernel () at init/main.c:536
#1  0xffffffff810000d4 in secondary_startup_64 () at arch/x86/kernel/head_64.S:243
#2  0x0000000000000000 in ?? ()

We can start/resume the kernel with c, and pause it with ctrl+c. The gdb scripts provided by the kernel via CONFIG_GDB_SCRIPTS can be viewed with apropos lx. lx-dmesg is incredibly handy for viewing the kernel dmesg buffer, particularly in the case of a kernel panic before the serial driver has been brought up (in which case there’s output from QEMU to stdout, which is just as much fun as debugging graphics code (ie. black screen)).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
(gdb) apropos lx
function lx_current -- Return current task
function lx_module -- Find module by name and return the module variable
function lx_per_cpu -- Return per-cpu variable
function lx_task_by_pid -- Find Linux task by PID and return the task_struct variable
function lx_thread_info -- Calculate Linux thread_info from task variable
function lx_thread_info_by_pid -- Calculate Linux thread_info from task variable found by pid
lx-cmdline --  Report the Linux Commandline used in the current kernel
lx-cpus -- List CPU status arrays
lx-dmesg -- Print Linux kernel log buffer
lx-fdtdump -- Output Flattened Device Tree header and dump FDT blob to the filename
lx-iomem -- Identify the IO memory resource locations defined by the kernel
lx-ioports -- Identify the IO port resource locations defined by the kernel
lx-list-check -- Verify a list consistency
lx-lsmod -- List currently loaded modules
lx-mounts -- Report the VFS mounts of the current process namespace
lx-ps -- Dump Linux tasks
lx-symbols -- (Re-)load symbols of Linux kernel and currently loaded modules
lx-version --  Report the Linux Version of the current kernel

(gdb) lx-version
Linux version 4.19.0-rc7+ (nick@nick-Blade-Stealth) (clang version 8.0.0
(https://git.llvm.org/git/clang.git/ 60c8c0cc0786c7f6b8dc5c1e3acd7ec98f0a7b6d)
(https://git.llvm.org/git/llvm.git/ 64c3a57bec9dbe3762fa1d80055ba850d6658f5b))
#18 SMP Wed Oct 24 19:29:53 PDT 2018

Where to Go From Here

Maybe try cross compiling a kernel (you’ll need a cross compiler/assembler/linker/debugger and likely a different console=ttyXXX kernel command line parameter), building your own root filesystem with buildroot, or exploring the rest of QEMU’s command line options.

Speeding Up Linux Kernel Builds With Ccache

| Comments

ccache, the compiler cache, is a fantastic way to speed up build times for C and C++ code that I previously recommended. Recently, I was playing around with trying to get it to speed up my Linux kernel builds, but wasn’t seeing any benefit. Usually when this happens with ccache, there’s something non-deterministic about the builds that prevents cache hits.

Turns out someone asked this exact question on the ccache mailing list back in 2014, and a teammate from my Android days supposed a timestamp was the culprit. That, and this LKML post from the KBUILD maintainer in 2011 about determinism helped me find commit 87c94bfb8ad35 (“kbuild: override build timestamp & version”) that introduced manually overriding part of the version string that contains the build timestamp that can be seen from:

1
2
3
4
5
$ cat /proc/version
Linux version 4.15.0-13-generic (buildd@lgw01-amd64-028)
(gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9))
#14~16.04.1-Ubuntu SMP
Sat Mar 17 03:04:59 UTC 2018

With ccache, we can check the cache hit/miss stats with -s, clear the cache with -C, and clear the stats with -z. We can tell ccache explicitly which compiler to fall back to as its first argument (not strictly necessary). For KBUILD, we can swap our compiler by using CC= arg.

Let’s see what happens to our build time for subsequent builds with a hot cache:

No Cache

1
2
3
4
$ make clean
$ time make -j4
...
make -j4  2008.93s user 231.69s system 346% cpu 10:47.07 total

Cold Cache

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
$ ccache -Cz
Cleared cache
Statistics cleared
$ ccache -s
cache directory                     /home/nick/.ccache
primary config                      /home/nick/.ccache/ccache.conf
secondary config      (readonly)    /etc/ccache.conf
cache hit (direct)                     0
cache hit (preprocessed)               0
cache miss                             0
cache hit rate                      0.00 %
cleanups performed                     0
files in cache                         0
cache size                           0.0 kB
max cache size                       5.0 GB

$ make clean
$ time KBUILD_BUILD_TIMESTAMP='' make CC="ccache gcc" -j4
...
KBUILD_BUILD_TIMESTAMP='' make CC="ccache gcc" -j4  2426.79s user 312.08s system 372% cpu 12:15.22 total
$ ccache -s
cache directory                     /home/nick/.ccache
primary config                      /home/nick/.ccache/ccache.conf
secondary config      (readonly)    /etc/ccache.conf
cache hit (direct)                     0
cache hit (preprocessed)               0
cache miss                          3242
called for link                        6
called for preprocessing             538
unsupported source language           66
no input file                        108
files in cache                      9720
cache size                         432.6 MB
max cache size                       5.0 GB

Hot Cache

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ ccache -z
Statistics cleared
$ make clean
$ time KBUILD_BUILD_TIMESTAMP='' make CC="ccache gcc" -j4
...
KBUILD_BUILD_TIMESTAMP='' make CC="ccache gcc" -j4  151.85s user 132.98s system 288% cpu 1:38.90 total
$ ccache -s
cache directory                     /home/nick/.ccache
primary config                      /home/nick/.ccache/ccache.conf
secondary config      (readonly)    /etc/ccache.conf
cache hit (direct)                  3232
cache hit (preprocessed)               7
cache miss                             3
called for link                        6
called for preprocessing             538
unsupported source language           66
no input file                        108
files in cache                      9734
cache size                         432.8 MB
max cache size                       5.0 GB

The initial cold cache build will be slower than not using ccache at all, but it’s a one time cost that’s not significant relative to the savings. No caching took 647.07s, initial cold cache build took 735.22s (13.62% slower), and subsequent hot cache builds took 98.9s (6.54x faster). YMMV based on CPU and disk performance. Also, it’s not the most common workflow to do clean builds, but we do this for Linux kernel builds for Android/Pixel at work, and this helps me significantly for local development.

Now, if you really need that date string in there, you theoretically could put some garbage value in there (for the cache) long enough to save enough space for a date string, then patch your vmlinux binary after the fact. I don’t recommend that, but I would imagine that might look something like:

1
2
3
$ KBUILD_BUILD_TIMESTAMP=$(printf "y%0.s" $(seq 1 $(date | wc -c)))
$ make CC="ccache gcc" vmlinux
$ sed -i "s/$(KBUILD_BUILD_TIMESTAMP)/$(date)/g" vmlinux

Deterministic builds make build caching easier, but I’m not sure that build timestamp strings and build determinism are reconcileable.

GCC vs LLVM Q3 2017: Active Developer Counts

| Comments

A blog post from a few years ago that really stuck with me was Martin Olsson’s Browser Engines 2015: Commit Rates and Active Developer Counts, where he shows information about the number of authors and commits to popular web browsers. The graphs and analysis had interesting takeaways like showing the obvious split in blink and webkit, and relative number of contributors of the projects. Martin had data comparing gcc to llvm from Q4 2015, but I wanted to see what the data looked like now in Q3 2017 and wanted to share my findings; simply rerunning the numbers. Luckily Martin open sourced the scripts he used for measurements so they could be rerun.

Commit count and active authors in the previous 60 days is a rough estimate for project health; the scripts don’t/can’t account for unique authors (same author using different git commit info) and commit frequency is meaningless for comparing developers that commit early and commit often, but let’s take a look. Active contributors over 60 days cuts out folks who do commit to either code bases, just not as often. Lies, damn lies, and statistics, right? Or torture the data enough, and it will confess anything…

Note that LLVM is split into a few repositories (llvm the common base, clang the C/C++ frontend, libc++ the C++ runtime, compiler-rt the sanitizers/built-ins/profiler lib, lld the linker, clang-tools-extra the utility belt, lldb the debugger (there are more, these are the most active LLVM projects)). Later, I refer to LLVM as the grouping of these repos.

There’s a lot of caveats with this data. I suspect that the separate LLVM repo’s have a lot of overlap and have fewer active contributors when looked at in aggregate. That is to say you can’t simply add them else you’d be double counting a bunch. Also, the comparison is not quite fair since the overlap in front-end-language and back-end-target support in these two massive projects does not overlap in a lot of places.

LLVM’s 60 day active contributors are ~3x-5x times GCC’s and growing, while GCC’s 100-count hasn’t changed much since ‘04. It’s safe to say GCC is not dying; it’s going steady and chugging away as it has been, but it seems LLVM has been very strong in attracting active contributors. Either way, I’m thankful to have not one, but two high quality open source C/C++ compilers.

Running Clang-Tidy on the Linux Kernel

| Comments

Clang-Tidy is a linter from the LLVM ecosystem. I wanted to try to run it on the Linux kernel to see what kind of bugs it would find. The false positive rate seems pretty high (a persistent bane to static analysis), but some patching in both the tooling and the source can likely help bring this rate down.

The most straightforward way to invoke Clang-Tidy is with a compilation database, which is a json based file that for each translation unit records

  1. The source file of the translation unit.
  2. The top level directory of the source.
  3. The exact arguments passed to the compiler.

The exact arguments are required because -D and -I flags are necessary to reproduce the exact Abstract Syntax Tree (AST) used to compile your code. Given a compilation database, it’s trivial to parse and recreate a build. For the kernel’s KBuild, it’s a lot like encoding the output of make V=1.

In order to generate a compilation database, we can use an awesome tool called BEAR. BEAR will hook calls to exec and family, then write out the compilation database (compile_commands.json).

With BEAR installed, we can invoke the kernel’s build with bear make -j. When we’re done:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
➜  linux git:(nick) ✗ du -h compile_commands.json
11M compile_commands.json
➜  linux git:(nick) ✗ wc -l compile_commands.json
330296 compile_commands.json
➜  linux git:(nick) ✗ head -n 26 compile_commands.json
[
    {
        "arguments": [
            "cc",
            "-c",
            "-Wp,-MD,arch/x86/boot/tools/.build.d",
            "-Wall",
            "-Wmissing-prototypes",
            "-Wstrict-prototypes",
            "-O2",
            "-fomit-frame-pointer",
            "-std=gnu89",
            "-Wno-unused-value",
            "-Wno-unused-parameter",
            "-Wno-missing-field-initializers",
            "-I./tools/include",
            "-include",
            "include/generated/autoconf.h",
            "-D__EXPORTED_HEADERS__",
            "-o",
            "arch/x86/boot/tools/build",
            "arch/x86/boot/tools/build.c"
        ],
        "directory": "/home/nick/linux",
        "file": "arch/x86/boot/tools/build.c"
    },

Now with Clang-Tidy (probably worthwhile to build from source, but it’s also available off apt), we want to grab this helper script, run-clang-tidy.py to help analyze all this code.

1
curl -O https://raw.githubusercontent.com/llvm-mirror/clang-tools-extra/master/clang-tidy/tool/run-clang-tidy.py

Then we can run it from the same directory as compile_commands.json:

1
2
3
python run-clang-tidy.py \
  -clang-tidy-binary /usr/bin/clang-tidy-4.0 \
  > clang_tidy_output.txt

This took about 1hr12min on my box. Let’s see what the damage is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
➜  linux git:(nick) ✗ cat clang_tidy_output.txt \
  | grep warning: | grep -oE '[^ ]+$' | sort | uniq -c

     76 [clang-analyzer-core.CallAndMessage]
     15 [clang-analyzer-core.DivideZero]
      1 [clang-analyzer-core.NonNullParamChecker]
    316 [clang-analyzer-core.NullDereference]
     90 [clang-analyzer-core.UndefinedBinaryOperatorResult]
      1 [clang-analyzer-core.uninitialized.ArraySubscript]
   1410 [clang-analyzer-core.uninitialized.Assign]
     10 [clang-analyzer-core.uninitialized.Branch]
      5 [clang-analyzer-core.uninitialized.UndefReturn]
     11 [clang-analyzer-cplusplus.NewDeleteLeaks]
    694 [clang-analyzer-deadcode.DeadStores]
    342 [clang-analyzer-security.insecureAPI.strcpy]
      2 [clang-analyzer-unix.API]
     11 [clang-analyzer-unix.Malloc]
      4 [clang-diagnostic-address-of-packed-member]
      2 [clang-diagnostic-duplicate-decl-specifier]
     98 [clang-diagnostic-implicit-int]

Looking through the output, there’s seems to be almost nothing but false positives, but who knows, maybe there’s an actual bug or two in there. Likely possible patches to LLVM, its checkers, or the Linux kernel could lower the false positive ratio.

If you’re interested in seeing the kinds of warnings/outputs, I’ve uploaded my results run on a 4.12-rc3 based kernel that may or may not have been compiled with Clang to my clang_tidy branch of the kernel on GitHub. As in my sorted output, I find it handy to grep for warning:. Maybe you can find yourself a good first bug to contribute a fix to the kernel?

There’s likely also some checks that make sense to disable or enable. Clang-Tidy also allows you to write and use your own checkers. Who knows, someone may just end up writing static analyses tailored to the Linux kernel.

Submitting Your First Patch to the Linux Kernel and Responding to Feedback

| Comments

After working on the Linux kernel for Nexus and Pixel phones for nearly a year, and messing around with the excellent Eudyptula challenge, I finally wanted to take a crack at submitting patches upstream to the Linux kernel.

This post is woefully inadequate compared to the existing documentation, which should be preferred.

I figure I’d document my workflow, now that I’ve gotten a few patches accepted (and so I can refer to this post rather than my shell history…). Feedback welcome (open an issue or email me).

Step 1: Setting up an email client

I mostly use git send-email for sending patch files. In my ~/,gitconfig I have added:

1
2
3
4
5
6
[sendemail]
  ; setup for using git send-email; prompts for password
  smtpuser = myemailaddr@gmail.com
  smtpserver = smtp.googlemail.com
  smtpencryption = tls
  smtpserverport = 587

To send patches through my gmail account. I don’t add my password so that I don’t have to worry about it when I publish my dotfiles. I simply get prompted every time I want to send an email.

I use mutt to respond to threads when I don’t have a patch to send.

Step 2: Make fixes

How do you find a bug to fix? My general approach to finding bugs in open source C/C++ code bases has been using static analysis, a different compiler, and/or more compiler warnings turned on. The kernel also has an instance of bugzilla running as an issue tracker. Work out of a new branch, in case you choose to abandon it later. Rebase your branch before submitting (pull early, pull often).

Step 3: Thoughtful commit messages

I always run git log <file I modified> to see some of the previous commit messages on the file I modified.

1
2
3
4
5
6
7
8
$ git log arch/x86/Makefile

commit a5859c6d7b6114fc0e52be40f7b0f5451c4aba93
...
    x86/build: convert function graph '-Os' error to warning
commit 3f135e57a4f76d24ae8d8a490314331f0ced40c5
...
    x86/build: Mostly disable '-maccumulate-outgoing-args'

The first words of commit messages in Linux are usually <subsystem>/<sub-subsystem>: <descriptive comment>.

Let’s commit, git commit <files> -s. We use the -s flag to git commit to add our signoff. Signing your patches is standard and notes your agreement to the Linux Kernel Certificate of Origin.

Step 4: Generate Patch file

git format-patch HEAD~. You can use git format-patch HEAD~<number of commits to convert to patches> to turn multiple commits into patch files. These patch files will be emailed to the Linux Kernel Mailing List (lkml). They can be applied with git am <patchfile>. I like to back these files up in another directory for future reference, and cause I still make a lot of mistakes with git.

Step 5: checkpatch

You’re going to want to run the kernel’s linter before submitting. It will catch style issues and other potential issues.

1
2
3
4
$ ./scripts/checkpatch.pl 0001-x86-build-don-t-add-maccumulate-outgoing-args-w-o-co.patch
total: 0 errors, 0 warnings, 9 lines checked

0001-x86-build-don-t-add-maccumulate-outgoing-args-w-o-co.patch has no obvious style problems and is ready for submission.

If you hit issues here, fix up your changes, update your commit with git commit --amend <files updated>, rerun format-patch, then rerun checkpatch until you’re good to go.

Step 6: email the patch to yourself

This is good to do when you’re starting off. While I use mutt for responding to email, I use git send-email for sending patches. Once you’ve gotten a hang of the workflow, this step is optional, more of a sanity check.

1
2
$ git send-email \
0001-x86-build-require-only-gcc-use-maccumulate-outgoing-.patch

You don’t need to use command line arguments to cc yourself, assuming you set up git correctly, git send-email should add you to the cc line as the author of the patch. Send the patch just to yourself and make sure everything looks ok.

Step 7: fire off the patch

Linux is huge, and has a trusted set of maintainers for various subsystems. The MAINTAINERS file keeps track of these, but Linux has a tool to help you figure out where to send your patch:

1
2
3
4
5
6
$ ./scripts/get_maintainer.pl 0001-x86-build-don-t-add-maccumulate-outgoing-args-w-o-co.patch
Person A <person@a.com> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Person B <person@b.com> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Person C <person@c.com> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
linux-kernel@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 64-BIT))

With some additional flags, we can feed this output directly into git send-email.

1
2
3
4
$ git send-email \
--cc-cmd='./scripts/get_maintainer.pl --norolestats 0001-my.patch' \
--cc person@a.com \
0001-my.patch

Make sure to cc yourself when prompted. Otherwise if you don’t subscribe to LKML, then it will be difficult to reply to feedback. It’s also a good idea to cc any other author that has touched this functionality recently.

Step 8: monitor feedback

Patchwork for the LKML is a great tool for tracking the progress of patches. You should register an account there. I highly recommend bookmarking your submitter link. In Patchwork, click any submitter, then Filters (hidden in the top left), change submitter to your name, click apply, then bookmark it. Here’s what mine looks like. Not much today, and mostly trivial patches, but hopefully this post won’t age well in that regard.

Feedback may or may not be swift. I think my first patch I had to ping a couple of times, but eventually got a response.

Step 9: responding to feedback

Update your file, git commit <changed files> --amend to update your latest commit, git format-patch -v2 HEAD~, edit the patch file to put the changes below the dash below the signed off lines (example), rerun checkpatch, rerun get_maintainer if the files you modified changed since V1. Next, you need to find the messageID to respond to the thread properly.

In gmail, when viewing the message I want to respond to, you can click “Show Original” from the dropdown near the reply button. From there, copy the MessageID from the top (everything in the angle brackets, but not the brackets themselves). Finally, we send the patch:

1
2
3
4
5
$ git send-email \
--cc-cmd='./scripts/get_maintainer.pl --norolestats 0001-my.patch' \
--cc person@a.com \
--in-reply-to 2017datesandletters@somehostname \
0001-my.patch

We make sure to add anyone who may have commented on the patch from the mailing list to keep them in the loop. Rinse and repeat 2 through 9 as desired until patch is signed off/acked or rejected.

I’ve added this handy shell function to my ~/.zshrc:

1
2
3
4
5
6
7
function kpatch () {
  patch=$1
  shift
  git send-email \
    --cc-cmd="./scripts/get_maintainer.pl --norolestats $patch" \
    $@ $patch
}

That I can then invoke like:

1
kpatch 0001-Input-mousedev-fix-implicit-conversion-warning.patch --cc anyone@else.com

where anyone@else.com is anyone I want to add in additon to what get_maintainer sugguests.

Finding out when your patch gets merged is a little tricky; each subsystem maintainer seems to do things differently. My first patch, I didn’t know it went in until a bot at Google notified me. The maintainers for the second and third patches had bots notify me that it got merged into their trees, but when they send Linus a PR and when that gets merged isn’t immediately obvious.

It’s not like Github where everyone involved gets an email that a PR got merged and the UI changes. While there’s pros and cons to having this fairly decentralized process, and while it is kind of is git’s original designed-for use case, I’d be remiss not to mention that I really miss Github. Getting your first patch acknowledged and even merged is intoxicating and makes you want to contribute more; radio silence has the opposite effect.

Happy hacking!

(Thanks to Reddit user /u/EliteTK for pointing out that -v2 was more concise than --subject-prefix="Patch vX").

Static and Dynamic Libraries

| Comments

This is the second post in a series on memory segmentation. It covers working with static and dynamic libraries in Linux and OSX. Make sure to check out the first on object files and symbols.

Let’s say we wanted to reuse some of the code from our previous project in our next one. We could continue to copy around object files, but let’s say we have a bunch and it’s hard to keep track of all of them. Let’s combine multiple object files into an archive or static library. Similar to a more conventional zip file or “compressed archive,” our static library will be an uncompressed archive.

We can use the ar command to create and manipulate a static archive.

1
2
$ clang -c x.c y.c
$ ar -rv libhello.a x.o y.o

The -r flag will create the archive named libhello.a and add the files x.o and y.o to its index. I like to add the -v flag for verbose output. Then we can use the familiar nm tool I introduced in the previous post to examine the content of the archives and their symbols.

1
2
3
4
5
6
7
8
9
10
11
$ file libhello.a
libhello.a: current ar archive random library
$ nm libhello.a
libhello.a(x.o):
                 U _puts
0000000000000000 T _x


libhello.a(y.o):
                 U _puts
0000000000000000 T _y

Some other useful flags for ar are -d to delete an object file, ex. ar -d libhello.a y.o and -u to update existing members of the archive when their source and object files are updated. Not only can we run nm on our archive, otool and objdump both work.

Now that we have our static library, we can statically link it to our program and see the resulting symbols. The .a suffix is typical on both OSX and Linux for archive files.

1
2
3
4
5
6
$ clang main.o libhello.a
$ nm a.out
0000000100000f30 T _main
                 U _puts
0000000100000f50 T _x
0000000100000f70 T _y

Our compiler understands how to index into archive files and pull out the functions it needs to combine into the final executable. If we use a static library to statically link all functions required, we can have one binary with no dependencies. This can make deployment of binaries simple, but also greatly increase their size. Upgrading large binaries incrementally becomes more costly in terms of space.

While static libraries allowed us to reuse source code, static linkage does not allow us to reuse memory for executable code between different processes. I really want to put off talking about memory benefits until the next post, but know that the solution to this problem lies in “dynamic libraries.”

While having a single binary file keeps things simple, it can really hamper memory sharing and incremental relinking. For example, if you have multiple executables that are all built with the same static library, unless your OS is really smart about copy-on-write page sharing, then you’re likely loading multiple copies of the same exact code into memory! What a waste! Also, when you want to rebuild or update your binary, you spend time performing relocation again and again with static libraries. What if we could set aside object files that we could share amongst multiple instances of the same or even different processes, and perform relocation at runtime?

The solution is known as dynamic libraries. If static libraries and static linkage were Atari controllers, dynamic libraries and dynamic linkage are Steel Battalion controllers. We’ll show how to work with them in the rest of this post, but I’ll prove how memory is saved in a later post.

Let’s say we want to created a shared version of libhello. Dynamic libraries typically have different suffixes per OS since each OS has it’s preferred object file format. On Linux the .so suffix is common, .dylib on OSX, and .dll on Windows.

1
2
3
4
5
6
7
$ clang -shared -fpic x.c y.c -o libhello.dylib
$ file libhello.dylib
libhello.dylib: Mach-O 64-bit dynamically linked shared library x86_64
$ nm libhello.dylib
                 U _puts
0000000000000f50 T _x
0000000000000f70 T _y

The -shared flag tells the linker to create a special file called a shared library. The -fpic option converts absolute addresses to relative addresses, which allows for different processes to load the library at different virtual addresses and share memory.

Now that we have our shared library, let’s dynamically link it into our executable.

1
2
3
4
$ clang main.c libhello.dylib
$ ./a.out
x
y

The dynamic linker essential produces an incomplete binary. You can verify with nm. At runtime, we’ll delay start up to perform some memory mapping early on in the process start (performed by the dynamic linker) and pay slight costs for trampolining into position independent code.

Let’s say we want to know what dynamic libraries a binary is using. You can either query the executable (most executable object file formats contain a header the dynamic linker will parse and pull in libs) or observe the executable while running it. Because each major OS has its own object file format, they each have their own tools for these two checks. Note that statically linked libraries won’t show up here, since their object code has already been linked in and thus we’re not able to differentiate between object code that came from our first party code vs third party static libraries.

On OSX, we can use otool -L <bin> to check which .dylibs will get pulled in.

1
2
3
4
$ otool -L a.out
a.out:
           libhello.dylib (compatibility version 0.0.0, current version 0.0.0)
           /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)

So we can see that a.out depends on libhello.dylib (and expects to find it in the same directory as a.out). It also depends on shared library called libSystem.B.dylib. If you run otool -L on libSystem itself, you’ll see it depends on a bunch of other libraries including a C runtime, malloc implementation, pthreads implementation, and more. Let’s say you want to find the final resting place of where a symbol is defined, without digging with nm and otool, you can fire up your trusty debugger and ask it.

1
2
3
4
5
$ lldb a.out
...
(lldb) image lookup -r -s puts
...
        Summary: libsystem_c.dylib`puts        Address: libsystem_c.dylib[0x0000000000085c30] (libsystem_c.dylib.__TEXT.__stubs + 3216)

You’ll see a lot of output since puts is treated as a regex. You’re looking for the Summary line that has an address and is not a symbol stub. You can then check your work with otool and nm.

If we want to observe the dynamic linker in action on OSX, we can use dtruss:

1
2
3
4
5
6
7
8
9
10
11
$ sudo dtruss ./a.out
...
stat64("libhello.dylib\0", 0x7FFF50CEAC68, 0x1)         = 0 0
open("libhello.dylib\0", 0x0, 0x0)              = 3 0
...
mmap(0x10EF27000, 0x1000, 0x5, 0x12, 0x3, 0x0)          = 0x10EF27000 0
mmap(0x10EF28000, 0x1000, 0x3, 0x12, 0x3, 0x1000)               = 0x10EF28000 0
mmap(0x10EF29000, 0xC0, 0x1, 0x12, 0x3, 0x2000)         = 0x10EF29000 0
...
close(0x3)              = 0 0
...

On Linux, we can simply use ldd or readelf -d to query an executable for a list of its dynamic libraries.

1
2
3
4
5
6
7
8
9
10
11
12
13
$ clang -shared -fpic x.c y.c -o libhello.so
$ clang main.c libhello.so
$ ldd a.out
           linux-vdso.so.1 =>  (0x00007fff95d43000)
           libhello.so => not found
           libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcc98c5f000)
           /lib64/ld-linux-x86-64.so.2 (0x0000555993852000)
$ readelf -d a.out
Dynamic section at offset 0xe18 contains 25 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libhello.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
...

We can then use strace to observe the dynamic linker in action on Linux:

1
2
3
4
5
6
7
8
$ LD_LIBRARY_PATH=. strace ./a.out
...
open("./libhello.so", O_RDONLY|O_CLOEXEC) = 3
...
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\5\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=8216, ...}) = 0
close(3)                                = 0
...

What’s this LD_LIBRARY_PATH thing? That’s shell syntax for setting an environmental variable just for the duration of that command (as opposed to exporting it so it stays set for multiple commands). As opposed to OSX’s dynamic linker, which was happy to look in the cwd for libhello.dylib, on Linux we must supply the cwd if the dynamic library we want to link in is not in the standard search path.

But what is the standard search path? Well, there’s another environmental variable we can set to see this, LD_DEBUG. For example, on OSX:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ LD_DEBUG=libs LD_LIBRARY_PATH=. ./a.out
     15828:        find library=libhello.so [0]; searching
     15828:         search path=./tls/x86_64:./tls:./x86_64:.             (LD_LIBRARY_PATH)
     15828:          trying file=./tls/x86_64/libhello.so
     15828:          trying file=./tls/libhello.so
     15828:          trying file=./x86_64/libhello.so
     15828:          trying file=./libhello.so
     15828:
     15828:        find library=libc.so.6 [0]; searching
     15828:         search path=./tls/x86_64:./tls:./x86_64:.             (LD_LIBRARY_PATH)
     15828:          trying file=./tls/x86_64/libc.so.6
     1earc:          trying file=./tls/libc.so.6
     15828:          trying file=./x86_64/libc.so.6
     15828:          trying file=./libc.so.6
     15828:         search cache=/etc/ld.so.cache
     15828:          trying file=/lib/x86_64-linux-gnu/libc.so.6
     15828:        calling init: /lib/x86_64-linux-gnu/libc.so.6
     15828:        calling init: ./libhello.so
     15828:        initialize program: ./a.out
     15828:        transferring control: ./a.out
x
y
     15828:        calling fini: ./a.out [0]
     15828:        calling fini: ./libhello.so [0]

LD_DEBUG is pretty useful. Try:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ LD_DEBUG=help ./a.out
Valid options for the LD_DEBUG environment variable are:


  libs        display library search paths
  reloc       display relocation processing
  files       display progress for input file
  symbols     display symbol table processing
  bindings    display information about symbol binding
  versions    display version dependencies
  scopes      display scope information
  all         all previous options combined
  statistics  display relocation statistics
  unused      determined unused DSOs
  help        display this help message and exit


To direct the debugging output into a file instead of standard output
a filename can be specified using the LD_DEBUG_OUTPUT environment variable.

For some cool stuff, I recommend checking out LD_DEBUG=symbols and LD_DEBUG=statistics.

Going back to LD_LIBRARY_PATH, usually libraries you create and want to reuse between projects go into /usr/local/lib and the headers into /usr/local/include. I think of the convention as:

1
2
3
4
5
6
7
8
9
$ tree -L 2 /usr/
/usr
├── bin # system installed binaries like nm, gcc
├── include # system installed headers like stdio.h
├── lib # system installed libraries, both static and dynamic
└── local
    ├── bin # user installed binaries like rustc
    ├── include # user installed headers
    └── lib # user installed

Unfortunately, it’s a loose convention that’s broken down over the years and things are scattered all over the place. You can also run into dependency and versioning issues, that I don’t want to get into here, by placing libraries here instead of keeping them in-tree or out-of-tree of the source code of a project. Just know when you see a library like libc.so.6 that the numeric suffix is a major version number that follows semantic versioning. For more information, you should read Michael Kerrisk’s excellent book The Linux Programming Interface. This post is based on his chapter’s 41 & 42 (but with more info on tooling and OSX).

If we were to place our libhello.so into /usr/local/lib (on Linux you need to then run sudo ldconfig) and move x.h and y.h to /usr/local/include, then we could then compile with:

1
$ clang main.c -lhello

Note that rather than give a full path to our library, we can use the -l flag followed by the name of our library with the lib prefix and .so suffix removed.

When working with shared libraries and external code, three flags I use pretty often:

1
2
3
* -l<libname to link, no lib prefix or file extension; ex: -lnanomsg to link libnanomsg.so>
* -L <path to search for lib if in non standard directory>
* -I <path to headers for that library, if in non standard directory>

For finding specific flags needed for compilation where dynamic linkage is required, a tool called pkg-config can be used for finding appropriate flags. I’ve had less than stellar experiences with the tool as it puts the onus on the library author to maintain the .pc files, and the user to have them installed in the right place that pkg-config looks. When they do exist and are installed properly, the tool works well:

1
2
3
4
$ sudo apt-get install libpng12-dev
$ pkg-config --libs --cflags libpng12
-I/usr/include/libpng12  -lpng12
$ clang program.c `!!`

Using another neat environmental variable, we can hook into the dynamic linkage process and inject our own shared libraries to be linked instead of the expected libraries. Let’s say libgood.so and libmalicous.so both define a symbol for a function (the same symbol name and function signature). We can get a binary that links in libgood.so’s function to instead call libmalicous.so’s version:

1
2
3
4
$ ./a.out
hello from libgood
$ LD_PRELOAD=./libmalicious.so ./a.out
hello from libmalicious

LD_PRELOAD is not available on OSX, instead you can use DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, and recompile the original executable with -flat_namespace. Having to recompile the original executable is less than ideal for hooking an existing binary, but I could not hook libc as in the previous libmalicious example. I would be interested to know if you can though!

Manually invoking the dynamic linker from our code, we can even man in the middle library calls (call our hooked function first, then invoke the original target). We’ll see more of this in the next post on using the dynamic linker.

As you can guess, readjusting the search paths for dynamic libraries is a security concern as it let’s good and bad folks change the expected execution paths. Guarding against the use of these env vars becomes a rabbit hole that gets pretty tricky to solve without the heavy handed use of statically linking dependencies.

In the the previous post, I alluded to undefined symbols like puts. puts is part of libc, which is probably the most shared dynamic library on most computing devices since most every program makes use of the C runtime. (I think of a “runtime” as implicit code that runs in your program that you didn’t necessarily write yourself. Usually a runtime is provided as a library that gets implicitly linked into your executable when you compile.) You can statically link against libc with the -static flag, on Linux at least (OSX makes this difficult, “Apple does not support statically linked binaries on Mac OS X”).

I’m not sure what the benefit would be to mixing static and dynamic linking, but after searching the search paths from LD_DEBUG=libs for shared versions of a library, if any static ones are found, they will get linked in.

There’s also an interesting form of library called a “virtual dynamic shared object” on Linux. I haven’t covered memory mapping yet, but know it exists, is usually hidden for libc, and that you can read more about it via man 7 vdso.

One thing I find interesting and don’t quite understand how to recreate is that somehow glibc on Linux is also executable:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Ubuntu GLIBC 2.24-3ubuntu1) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.2.0 20161005.
Available extensions:
    crypt add-on version 2.1 by Michael Glad and others
    GNU Libidn by Simon Josefsson
    Native POSIX Threads Library by Ulrich Drepper et al
    BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.

Also, note that linking against third party code has licensing implications (of course) of particular interest when it’s GPL or LGPL. Here is a good overview which I’d summarize as: code that statically links against LGPL code must also be LGPL, while any form of linkage against GPL code must be GPL’d.

Ok, that was a lot. In the previous post, we covered Object Files and Symbols. In this post we covered hacking around with static and dynamic linkage. In the next post, I hope to talk about manually invoking the dynamic linker at runtime.

Object Files and Symbols

| Comments

What was supposed to be one blog post about memory segmentation turned into what will be a series of posts. As the first in the series, we cover the extreme basics of object files and symbols. In follow up posts, I plan to talk about static libraries, dynamic libraries, dynamic linkage, memory segments, and finally memory usage accounting. I also cover command line tools for working with these notions, both in Linux and OSX.

A quick review of the compilation+execution pipeline (for terminology):

  1. Lexing produces tokens
  2. Parsing produces an abstract syntax tree
  3. Analysis produces a code flow graph
  4. Optimization produces a reduced code flow graph
  5. Code gen produces object code
  6. Linkage produces a complete executable
  7. Loader instructs the OS how to start running the executable

This series will focus on part #6.

Let’s say you have some amazing C/C++ code, but for separations of concerns, you want to start moving it out into separate source files. Whereas previously in one file you had:

1
2
3
4
5
6
7
8
// main.c
#include <stdio.h>
void helper () {
  puts("helper");
}
int main () {
  helper();
}

You now have two source files and maybe a header:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// main.c
#include "helper.h"
int main () {
  helper();
}

// helper.h
void helper();

//helper.c
#include <stdio.h>
#include "helper.h"
void helper () {
  puts("helper");
}

In the single source version, we would have compiled and linked that with clang main.c and had an executable file. In the multiple source version, we first compile our source files to object files, then link them altogether. That can be done separately:

1
2
3
$ clang -c helper.c     # produces helper.o
$ clang -c main.c       # produces main.o
$ clang main.o helper.o # produces a.out

We can also do the compilation and linkage in one step:

1
$ clang helper.c main.c # produces a.out

Nothing special thus far; C/C++ 101. In the first case of separate compilation and linkage steps, we were left with intermediate object files (.o). What exactly are these?

Object files are almost full executables. They contain machine code, but that code still requires a relocation step. It also contains metadata about the addresses of its variables and functions (called symbols) in an associative data structure called a symbol table. The addresses may not be the final address of the symbol in the final executable. They also contain some information for the loader and probably some other stuff.

Remember that if we fail to specify the helper object file, we’ll get an undefined symbol error.

1
2
3
4
5
6
$ clang main.c
Undefined symbols for architecture x86_64:
  "_helper", referenced from:
      _main in main-459dde.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

The problem is main.o refers to some symbol called helper, but on it’s own doesn’t contain any more information about it. Let’s say we want to know what symbols an object file contains, or expects to find elsewhere. Let’s introduce our first tool, nm. nm will print the name list or symbol table for a given object or executable file. On OSX, these are prefixed with an underscore.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ nm helper.o
0000000000000000 T _helper
                 U _puts

$ nm main.o
                 U _helper
0000000000000000 T _main

$ nm a.out
...
0000000100000f50 T _helper
0000000100000f70 T _main
                 U _puts
...

Let’s dissect what’s going on here. The output (as understood by man 1 nm) is a space separated list of address, type, and symbol name. We can see that the addresses are placeholders in object files, and final in executables. The name should make sense; it’s the name of the function or variable. While I’d love to get in depth on the various symbol types and talk about sections, I don’t think I could do as great a job as Peter Van Der Linden in his book “Expert C Programming: Deep C Secrets.”

For our case, we just care about whether the symbol in a given object file is defined or not. The type U (undefined) means that this symbol is referenced or used in this object code/executable, but it’s value wasn’t defined here. When we compiled main.c alone and got the undefined symbol error, it should now make sense why we got the undefined symbol error for helper. main.o contains a symbol for main, and references helper. helper.o contains a symbol for helper, and references to puts. The final executable contains symbols for main and helper and references to puts.

You might be wondering where puts comes from then, and why didn’t we get an undefined symbol error for puts like we did earlier for helper. The answer is the C runtime. libc is implicitly dynamically linked to all executables created by the C compiler. We’ll cover dynamic linkage in a later post in this series.

When the linker performs relocation on the object files, combining them into a final executable, it goes through placeholders of addresses and fills them in. We did this manually in our post on JIT compilers.

While nm gave us a look into our symbol table, two other tools I use frequently are objdump on Linux and otool on OSX. Both of these provide disassembled assembly instructions and their addresses. Note how the symbols for functions get translated into labels of the disassembled functions, and that their address points to the first instruction in that label. Since I’ve shown objdump numerous times in previous posts, here’s otool.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
$ otool -tV helper.o
helper.o:
(__TEXT,__text) section
_helper:
0000000000000000    pushq    %rbp
0000000000000001    movq    %rsp, %rbp
0000000000000004    subq    $0x10, %rsp
0000000000000008    leaq    0xe(%rip), %rdi         ## literal pool for: "helper"
000000000000000f    callq    _puts
0000000000000014    movl    %eax, -0x4(%rbp)
0000000000000017    addq    $0x10, %rsp
000000000000001b    popq    %rbp
000000000000001c    retq
$ otool -tV main.o
main.o:
(__TEXT,__text) section
_main:
0000000000000000    pushq    %rbp
0000000000000001    movq    %rsp, %rbp
0000000000000004    movb    $0x0, %al
0000000000000006    callq    _helper
000000000000000b    xorl    %eax, %eax
000000000000000d    popq    %rbp
000000000000000e    retq
$ otool -tV a.out
a.out:
(__TEXT,__text) section
_helper:
0000000100000f50    pushq    %rbp
0000000100000f51    movq    %rsp, %rbp
0000000100000f54    subq    $0x10, %rsp
0000000100000f58    leaq    0x43(%rip), %rdi        ## literal pool for: "helper"
0000000100000f5f    callq    0x100000f80             ## symbol stub for: _puts
0000000100000f64    movl    %eax, -0x4(%rbp)
0000000100000f67    addq    $0x10, %rsp
0000000100000f6b    popq    %rbp
0000000100000f6c    retq
0000000100000f6d    nop
0000000100000f6e    nop
0000000100000f6f    nop
_main:
0000000100000f70    pushq    %rbp
0000000100000f71    movq    %rsp, %rbp
0000000100000f74    movb    $0x0, %al
0000000100000f76    callq    _helper
0000000100000f7b    xorl    %eax, %eax
0000000100000f7d    popq    %rbp
0000000100000f7e    retq

readelf -s <object file> will give us a list of symbols on Linux. ELF is the file format used by the loader on Linux, while OSX uses Mach-O. Thus readelf and otool, respectively.

Also note that for static linkage, symbols need to be unique*, as they refer to memory locations to either read/write to in the case of variables or locations to jump to in the case of functions.

1
2
3
4
5
6
7
8
9
10
11
12
$ cat double_define.c
void a () {}
void a () {}
int main () {}
$ clang double_define.c
double_define.c:2:6: error: redefinition of 'a'
void a () {}
     ^
double_define.c:1:6: note: previous definition is here
void a () {}
     ^
1 error generated.

*: there’s a notion of weak symbols, and some special things for dynamic libraries we’ll see in a follow up post.

Languages like C++ that support function overloading (functions with the same name but different arguments, return types, namespaces, or class) must mangle their function names to make them unique.

Code like:

1
2
3
4
5
6
7
namespace util {
  class Widget {
    public:
      void doSomething (bool save);
      void doSomething (int n);
  };
}

Will produce symbols like:

1
2
3
4
5
$ clang class.cpp -std=c++11
$ nm a.out
0000000100000f70 T __ZN4util6Widget11doSomethingEb
0000000100000f60 T __ZN4util6Widget11doSomethingEi
...

Note: GNU nm on Linux distros will have a --demangle option:

1
2
3
4
5
$ nm --demangle a.out
...
00000000004006d0 T util::Widget::doSomething(bool)
00000000004006a0 T util::Widget::doSomething(int)
...

On OSX, we can pipe nm into c++filt:

1
2
3
4
$ nm a.out | c++filt
0000000100000f70 T util::Widget::doSomething(bool)
0000000100000f60 T util::Widget::doSomething(int)
...

Finally, if you don’t have an object file, but instead a backtrace that needs demangling, you can either invoke c++filt manually or use demangler.com.

Rust also mangles its function names. For FFI or interface with C functions, other languages usually have to look for or expose symbols in a manner suited to C, the lowest common denominator. C++ has extern "C" blocks and Rust has extern blocks.

We can use strip to remove symbols from a binary. This can slim down a binary at the cost of making stack traces unreadable. If you’re following along at home, try comparing the output from your disassembler and nm before and after running strip on the executable. Luckily, you can’t strip the symbols out of object files, otherwise they’d be useless as you’d no longer be able to link them.

If we compile with the -g flag, we can create a different kind of symbol; debug symbols. Depending on your compiler+host OS, you’ll get another file you can run through nm to see an entry per symbol. You’ll get more info by using dwarfdump on this file. Debug symbols will retain source information such as filename and line number for all symbols.

This post should have been a simple refresher of some of the basics of working with C code. Finding symbols to be placed into a final executable and relocating addresses are the main job of the linker, and will be the main theme of the posts in this series. Keep your eyes out for more in this series on memory segmentation.

Cross Compiling C/C++ for Android

| Comments

Let’s say you want to build a hello world command line application in C or C++ and run it on your Android phone. How would you go about it? It’s not super practical; apps visible and distributable to end users must use the framework (AFAIK), but for folks looking to get into developing on ARM it’s likely they have an ARM device in their pocket.

This post is for folks who typically invoke their compiler from the command line, either explicitly, from build scripts, or other forms of automation.

At work, when working on Android, we typically checkout the entire Android source code (which is huge), use lunch to configure a ton of environmental variables, then use Makefiles with lots of includes and special vars. We don’t want to spend the time and disk space checking out the Android source code just to have a working cross compiler. Luckily, the Android tools team has an excellent utility to grab a prebuilt cross compiler.

This assumes you’re building from a Linux host. Android is a distribution of Linux, which is much easier to target from a Linux host. At home, I’ll usually develop on my OSX machine, ssh’d into my headless Linux box. (iTerm2 and tmux both have exclusive features, but I currently prefer iTerm2.)

The first thing we want to do is fetch the Android NDK. Not the SDK, the NDK.

1
2
3
➜  ~ curl -O \
  http://dl.google.com/android/repository/android-ndk-r12b-linux-x86_64.zip
➜  ~ unzip android-ndk-r12b-linux-x86_64.zip

It would be helpful to install adb and fastboot, too. This might be different for your distro’s package manager. Better yet may be to just build from source.

1
➜  ~ sudo apt-get install android-tools-adb android-tools-fastboot

Now for you Android device that you want to target, you’ll want to know the ISA. Let’s say I want to target my Nexus 6P, which has an ARMv8-A ISA (the first 64b ARM ISA).

1
2
➜  ~ ./android-ndk-r12b/build/tools/make_standalone_toolchain.py --arch arm64 \
  --install-dir ~/arm

This will create a nice standalone bundle in ~/arm. It will contain our cross compiler, linker, headers, libs, and sysroot (crt.o and friends). Most Android devices are ARMv7-A, so you’d use --arch arm. See the other supported architectures for cross compiling under table 4.

You might also want to change your install-dir and possible add it to your $PATH, or set $CC and $CXX.

Now we can compile hello_world.c.

1
2
3
4
5
6
7
8
9
10
11
➜  ~ cat hello_world.c
#include <stdio.h>
int main () {
  puts("hello world");
}

➜  ~ ~/arm/bin/clang -pie hello_world.c
➜  ~ file a.out
a.out: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically
linked, interpreter /system/bin/linker64,
BuildID[sha1]=ecc46648cf2c873253b3b522c0d14b91cf17c70f, not stripped

Since Android Lollipop, Android has required that executables be linked as position independent (-pie) to help provide ASLR.

<install-dir>/bin/ also has shell scripts with more full featured names like aarch64-linux-android-clang if you prefer to have clearer named executables in your $PATH.

Connect your phone, enable remote debugging, and accept the prompt for remote debugging.

1
2
3
➜  ~ adb push a.out /data/local/tmp/.
➜  ~ adb shell "./data/local/tmp/a.out"
hello world

We’ll use this toolchain in a follow post to start writing some ARMv8-A assembly. Stay tuned.

Setting Up Mutt With Gmail on Ubuntu

| Comments

I was looking to set up the mutt email client on my Ubuntu box to go through my gmail account. Since it took me a couple of hours to figure out, and I’ll probably forget by the time I need to know again, I figure I’d post my steps here.

I’m on Ubuntu 16.04 LTS (lsb_release -a)

Install mutt:

1
$ sudo apt-get install mutt

In gmail, allow other apps to access gmail:

Allowing less secure apps to access your account Less Secure Apps

Create the folder

1
2
3
$ sudo touch $MAIL
$ sudo chmod 660 $MAIL
$ sudo chown `whoami`:mail $MAIL

where $MAIL for me was /var/mail/nick.

Create the ~/.muttrc file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
set realname = "<first and last name>"
set from = "<gmail username>@gmail.com"
set use_from = yes
set envelope_from = yes

set smtp_url = "smtps://<gmail username>@gmail.com@smtp.gmail.com:465/"
set smtp_pass = "<gmail password>"
set imap_user = "<gmail username>@gmail.com"
set imap_pass = "<gmail password>"
set folder = "imaps://imap.gmail.com:993"
set spoolfile = "+INBOX"
set ssl_force_tls = yes

# G to get mail
bind index G imap-fetch-mail
set editor = "vim"
set charset = "utf-8"
set record = ''

I’m sure there’s better/more config options. Feel free to go wild, this is by no means a comprehensive setup.

Run mutt:

1
$ mutt

We should see it connect and download our messages. m to start sending new messages. G to fetch new messages.