Nick Desaulniers

The enemy's gate is down

GCC vs LLVM Q3 2017: Active Developer Counts

| Comments

A blog post from a few years ago that really stuck with me was Martin Olsson’s Browser Engines 2015: Commit Rates and Active Developer Counts, where he shows information about the number of authors and commits to popular web browsers. The graphs and analysis had interesting takeaways like showing the obvious split in blink and webkit, and relative number of contributors of the projects. Martin had data comparing gcc to llvm from Q4 2015, but I wanted to see what the data looked like now in Q3 2017 and wanted to share my findings; simply rerunning the numbers. Luckily Martin open sourced the scripts he used for measurements so they could be rerun.

Commit count and active authors in the previous 60 days is a rough estimate for project health; the scripts don’t/can’t account for unique authors (same author using different git commit info) and commit frequency is meaningless for comparing developers that commit early and commit often, but let’s take a look. Active contributors over 60 days cuts out folks who do commit to either code bases, just not as often. Lies, damn lies, and statistics, right? Or torture the data enough, and it will confess anything…

Note that LLVM is split into a few repositories (llvm the common base, clang the C/C++ frontend, libc++ the C++ runtime, compiler-rt the sanitizers/built-ins/profiler lib, lld the linker, clang-tools-extra the utility belt, lldb the debugger (there are more, these are the most active LLVM projects)). Later, I refer to LLVM as the grouping of these repos.

There’s a lot of caveats with this data. I suspect that the separate LLVM repo’s have a lot of overlap and have fewer active contributors when looked at in aggregate. That is to say you can’t simply add them else you’d be double counting a bunch. Also, the comparison is not quite fair since the overlap in front-end-language and back-end-target support in these two massive projects does not overlap in a lot of places.

LLVM’s 60 day active contributors are ~3x-5x times GCC’s and growing, while GCC’s 100-count hasn’t changed much since ‘04. It’s safe to say GCC is not dying; it’s going steady and chugging away as it has been, but it seems LLVM has been very strong in attracting active contributors. Either way, I’m thankful to have not one, but two high quality open source C/C++ compilers.

Running Clang-Tidy on the Linux Kernel

| Comments

Clang-Tidy is a linter from the LLVM ecosystem. I wanted to try to run it on the Linux kernel to see what kind of bugs it would find. The false positive rate seems pretty high (a persistent bane to static analysis), but some patching in both the tooling and the source can likely help bring this rate down.

The most straightforward way to invoke Clang-Tidy is with a compilation database, which is a json based file that for each translation unit records

  1. The source file of the translation unit.
  2. The top level directory of the source.
  3. The exact arguments passed to the compiler.

The exact arguments are required because -D and -I flags are necessary to reproduce the exact Abstract Syntax Tree (AST) used to compile your code. Given a compilation database, it’s trivial to parse and recreate a build. For the kernel’s KBuild, it’s a lot like encoding the output of make V=1.

In order to generate a compilation database, we can use an awesome tool called BEAR. BEAR will hook calls to exec and family, then write out the compilation database (compile_commands.json).

With BEAR installed, we can invoke the kernel’s build with bear make -j. When we’re done:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
➜  linux git:(nick) ✗ du -h compile_commands.json
11M compile_commands.json
➜  linux git:(nick) ✗ wc -l compile_commands.json
330296 compile_commands.json
➜  linux git:(nick) ✗ head -n 26 compile_commands.json
[
    {
        "arguments": [
            "cc",
            "-c",
            "-Wp,-MD,arch/x86/boot/tools/.build.d",
            "-Wall",
            "-Wmissing-prototypes",
            "-Wstrict-prototypes",
            "-O2",
            "-fomit-frame-pointer",
            "-std=gnu89",
            "-Wno-unused-value",
            "-Wno-unused-parameter",
            "-Wno-missing-field-initializers",
            "-I./tools/include",
            "-include",
            "include/generated/autoconf.h",
            "-D__EXPORTED_HEADERS__",
            "-o",
            "arch/x86/boot/tools/build",
            "arch/x86/boot/tools/build.c"
        ],
        "directory": "/home/nick/linux",
        "file": "arch/x86/boot/tools/build.c"
    },

Now with Clang-Tidy (probably worthwhile to build from source, but it’s also available off apt), we want to grab this helper script, run-clang-tidy.py to help analyze all this code.

1
curl -O https://raw.githubusercontent.com/llvm-mirror/clang-tools-extra/master/clang-tidy/tool/run-clang-tidy.py

Then we can run it from the same directory as compile_commands.json:

1
2
3
python run-clang-tidy.py \
  -clang-tidy-binary /usr/bin/clang-tidy-4.0 \
  > clang_tidy_output.txt

This took about 1hr12min on my box. Let’s see what the damage is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
➜  linux git:(nick) ✗ cat clang_tidy_output.txt \
  | grep warning: | grep -oE '[^ ]+$' | sort | uniq -c

     76 [clang-analyzer-core.CallAndMessage]
     15 [clang-analyzer-core.DivideZero]
      1 [clang-analyzer-core.NonNullParamChecker]
    316 [clang-analyzer-core.NullDereference]
     90 [clang-analyzer-core.UndefinedBinaryOperatorResult]
      1 [clang-analyzer-core.uninitialized.ArraySubscript]
   1410 [clang-analyzer-core.uninitialized.Assign]
     10 [clang-analyzer-core.uninitialized.Branch]
      5 [clang-analyzer-core.uninitialized.UndefReturn]
     11 [clang-analyzer-cplusplus.NewDeleteLeaks]
    694 [clang-analyzer-deadcode.DeadStores]
    342 [clang-analyzer-security.insecureAPI.strcpy]
      2 [clang-analyzer-unix.API]
     11 [clang-analyzer-unix.Malloc]
      4 [clang-diagnostic-address-of-packed-member]
      2 [clang-diagnostic-duplicate-decl-specifier]
     98 [clang-diagnostic-implicit-int]

Looking through the output, there’s seems to be almost nothing but false positives, but who knows, maybe there’s an actual bug or two in there. Likely possible patches to LLVM, its checkers, or the Linux kernel could lower the false positive ratio.

If you’re interested in seeing the kinds of warnings/outputs, I’ve uploaded my results run on a 4.12-rc3 based kernel that may or may not have been compiled with Clang to my clang_tidy branch of the kernel on GitHub. As in my sorted output, I find it handy to grep for warning:. Maybe you can find yourself a good first bug to contribute a fix to the kernel?

There’s likely also some checks that make sense to disable or enable. Clang-Tidy also allows you to write and use your own checkers. Who knows, someone may just end up writing static analyses tailored to the Linux kernel.

Submitting Your First Patch to the Linux Kernel and Responding to Feedback

| Comments

After working on the Linux kernel for Nexus and Pixel phones for nearly a year, and messing around with the excellent Eudyptula challenge, I finally wanted to take a crack at submitting patches upstream to the Linux kernel.

This post is woefully inadequate compared to the existing documentation, which should be preferred.

I figure I’d document my workflow, now that I’ve gotten a few patches accepted (and so I can refer to this post rather than my shell history…). Feedback welcome (open an issue or email me).

Step 1: Setting up an email client

I mostly use git send-email for sending patch files. In my ~/,gitconfig I have added:

1
2
3
4
5
6
[sendemail]
  ; setup for using git send-email; prompts for password
  smtpuser = myemailaddr@gmail.com
  smtpserver = smtp.googlemail.com
  smtpencryption = tls
  smtpserverport = 587

To send patches through my gmail account. I don’t add my password so that I don’t have to worry about it when I publish my dotfiles. I simply get prompted every time I want to send an email.

I use mutt to respond to threads when I don’t have a patch to send.

Step 2: Make fixes

How do you find a bug to fix? My general approach to finding bugs in open source C/C++ code bases has been using static analysis, a different compiler, and/or more compiler warnings turned on. The kernel also has an instance of bugzilla running as an issue tracker. Work out of a new branch, in case you choose to abandon it later. Rebase your branch before submitting (pull early, pull often).

Step 3: Thoughtful commit messages

I always run git log <file I modified> to see some of the previous commit messages on the file I modified.

1
2
3
4
5
6
7
8
$ git log arch/x86/Makefile

commit a5859c6d7b6114fc0e52be40f7b0f5451c4aba93
...
    x86/build: convert function graph '-Os' error to warning
commit 3f135e57a4f76d24ae8d8a490314331f0ced40c5
...
    x86/build: Mostly disable '-maccumulate-outgoing-args'

The first words of commit messages in Linux are usually <subsystem>/<sub-subsystem>: <descriptive comment>.

Let’s commit, git commit <files> -s. We use the -s flag to git commit to add our signoff. Signing your patches is standard and notes your agreement to the Linux Kernel Certificate of Origin.

Step 4: Generate Patch file

git format-patch HEAD~. You can use git format-patch HEAD~<number of commits to convert to patches> to turn multiple commits into patch files. These patch files will be emailed to the Linux Kernel Mailing List (lkml). They can be applied with git am <patchfile>. I like to back these files up in another directory for future reference, and cause I still make a lot of mistakes with git.

Step 5: checkpatch

You’re going to want to run the kernel’s linter before submitting. It will catch style issues and other potential issues.

1
2
3
4
$ ./scripts/checkpatch.pl 0001-x86-build-don-t-add-maccumulate-outgoing-args-w-o-co.patch
total: 0 errors, 0 warnings, 9 lines checked

0001-x86-build-don-t-add-maccumulate-outgoing-args-w-o-co.patch has no obvious style problems and is ready for submission.

If you hit issues here, fix up your changes, update your commit with git commit --amend <files updated>, rerun format-patch, then rerun checkpatch until you’re good to go.

Step 6: email the patch to yourself

This is good to do when you’re starting off. While I use mutt for responding to email, I use git send-email for sending patches. Once you’ve gotten a hang of the workflow, this step is optional, more of a sanity check.

1
2
$ git send-email \
0001-x86-build-require-only-gcc-use-maccumulate-outgoing-.patch

You don’t need to use command line arguments to cc yourself, assuming you set up git correctly, git send-email should add you to the cc line as the author of the patch. Send the patch just to yourself and make sure everything looks ok.

Step 7: fire off the patch

Linux is huge, and has a trusted set of maintainers for various subsystems. The MAINTAINERS file keeps track of these, but Linux has a tool to help you figure out where to send your patch:

1
2
3
4
5
6
$ ./scripts/get_maintainer.pl 0001-x86-build-don-t-add-maccumulate-outgoing-args-w-o-co.patch
Person A <person@a.com> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Person B <person@b.com> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Person C <person@c.com> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
linux-kernel@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 64-BIT))

With some additional flags, we can feed this output directly into git send-email.

1
2
3
4
$ git send-email \
--cc-cmd='./scripts/get_maintainer.pl --norolestats 0001-my.patch' \
--cc person@a.com \
0001-my.patch

Make sure to cc yourself when prompted. Otherwise if you don’t subscribe to LKML, then it will be difficult to reply to feedback. It’s also a good idea to cc any other author that has touched this functionality recently.

Step 8: monitor feedback

Patchwork for the LKML is a great tool for tracking the progress of patches. You should register an account there. I highly recommend bookmarking your submitter link. In Patchwork, click any submitter, then Filters (hidden in the top left), change submitter to your name, click apply, then bookmark it. Here’s what mine looks like. Not much today, and mostly trivial patches, but hopefully this post won’t age well in that regard.

Feedback may or may not be swift. I think my first patch I had to ping a couple of times, but eventually got a response.

Step 9: responding to feedback

Update your file, git commit <changed files> --amend to update your latest commit, git format-patch -v2 HEAD~, edit the patch file to put the changes below the dash below the signed off lines (example), rerun checkpatch, rerun get_maintainer if the files you modified changed since V1. Next, you need to find the messageID to respond to the thread properly.

In gmail, when viewing the message I want to respond to, you can click “Show Original” from the dropdown near the reply button. From there, copy the MessageID from the top (everything in the angle brackets, but not the brackets themselves). Finally, we send the patch:

1
2
3
4
5
$ git send-email \
--cc-cmd='./scripts/get_maintainer.pl --norolestats 0001-my.patch' \
--cc person@a.com \
--in-reply-to 2017datesandletters@somehostname \
0001-my.patch

We make sure to add anyone who may have commented on the patch from the mailing list to keep them in the loop. Rinse and repeat 2 through 9 as desired until patch is signed off/acked or rejected.

I’ve added this handy shell function to my ~/.zshrc:

1
2
3
4
5
6
7
function kpatch () {
  patch=$1
  shift
  git send-email \
    --cc-cmd="./scripts/get_maintainer.pl --norolestats $patch" \
    $@ $patch
}

That I can then invoke like:

1
kpatch 0001-Input-mousedev-fix-implicit-conversion-warning.patch --cc anyone@else.com

where anyone@else.com is anyone I want to add in additon to what get_maintainer sugguests.

Finding out when your patch gets merged is a little tricky; each subsystem maintainer seems to do things differently. My first patch, I didn’t know it went in until a bot at Google notified me. The maintainers for the second and third patches had bots notify me that it got merged into their trees, but when they send Linus a PR and when that gets merged isn’t immediately obvious.

It’s not like Github where everyone involved gets an email that a PR got merged and the UI changes. While there’s pros and cons to having this fairly decentralized process, and while it is kind of is git’s original designed-for use case, I’d be remiss not to mention that I really miss Github. Getting your first patch acknowledged and even merged is intoxicating and makes you want to contribute more; radio silence has the opposite effect.

Happy hacking!

(Thanks to Reddit user /u/EliteTK for pointing out that -v2 was more concise than --subject-prefix="Patch vX").

Static and Dynamic Libraries

| Comments

This is the second post in a series on memory segmentation. It covers working with static and dynamic libraries in Linux and OSX. Make sure to check out the first on object files and symbols.

Let’s say we wanted to reuse some of the code from our previous project in our next one. We could continue to copy around object files, but let’s say we have a bunch and it’s hard to keep track of all of them. Let’s combine multiple object files into an archive or static library. Similar to a more conventional zip file or “compressed archive,” our static library will be an uncompressed archive.

We can use the ar command to create and manipulate a static archive.

1
2
$ clang -c x.c y.c
$ ar -rv libhello.a x.o y.o

The -r flag will create the archive named libhello.a and add the files x.o and y.o to its index. I like to add the -v flag for verbose output. Then we can use the familiar nm tool I introduced in the previous post to examine the content of the archives and their symbols.

1
2
3
4
5
6
7
8
9
10
11
$ file libhello.a
libhello.a: current ar archive random library
$ nm libhello.a
libhello.a(x.o):
                 U _puts
0000000000000000 T _x


libhello.a(y.o):
                 U _puts
0000000000000000 T _y

Some other useful flags for ar are -d to delete an object file, ex. ar -d libhello.a y.o and -u to update existing members of the archive when their source and object files are updated. Not only can we run nm on our archive, otool and objdump both work.

Now that we have our static library, we can statically link it to our program and see the resulting symbols. The .a suffix is typical on both OSX and Linux for archive files.

1
2
3
4
5
6
$ clang main.o libhello.a
$ nm a.out
0000000100000f30 T _main
                 U _puts
0000000100000f50 T _x
0000000100000f70 T _y

Our compiler understands how to index into archive files and pull out the functions it needs to combine into the final executable. If we use a static library to statically link all functions required, we can have one binary with no dependencies. This can make deployment of binaries simple, but also greatly increase their size. Upgrading large binaries incrementally becomes more costly in terms of space.

While static libraries allowed us to reuse source code, static linkage does not allow us to reuse memory for executable code between different processes. I really want to put off talking about memory benefits until the next post, but know that the solution to this problem lies in “dynamic libraries.”

While having a single binary file keeps things simple, it can really hamper memory sharing and incremental relinking. For example, if you have multiple executables that are all built with the same static library, unless your OS is really smart about copy-on-write page sharing, then you’re likely loading multiple copies of the same exact code into memory! What a waste! Also, when you want to rebuild or update your binary, you spend time performing relocation again and again with static libraries. What if we could set aside object files that we could share amongst multiple instances of the same or even different processes, and perform relocation at runtime?

The solution is known as dynamic libraries. If static libraries and static linkage were Atari controllers, dynamic libraries and dynamic linkage are Steel Battalion controllers. We’ll show how to work with them in the rest of this post, but I’ll prove how memory is saved in a later post.

Let’s say we want to created a shared version of libhello. Dynamic libraries typically have different suffixes per OS since each OS has it’s preferred object file format. On Linux the .so suffix is common, .dylib on OSX, and .dll on Windows.

1
2
3
4
5
6
7
$ clang -shared -fpic x.c y.c -o libhello.dylib
$ file libhello.dylib
libhello.dylib: Mach-O 64-bit dynamically linked shared library x86_64
$ nm libhello.dylib
                 U _puts
0000000000000f50 T _x
0000000000000f70 T _y

The -shared flag tells the linker to create a special file called a shared library. The -fpic option converts absolute addresses to relative addresses, which allows for different processes to load the library at different virtual addresses and share memory.

Now that we have our shared library, let’s dynamically link it into our executable.

1
2
3
4
$ clang main.c libhello.dylib
$ ./a.out
x
y

The dynamic linker essential produces an incomplete binary. You can verify with nm. At runtime, we’ll delay start up to perform some memory mapping early on in the process start (performed by the dynamic linker) and pay slight costs for trampolining into position independent code.

Let’s say we want to know what dynamic libraries a binary is using. You can either query the executable (most executable object file formats contain a header the dynamic linker will parse and pull in libs) or observe the executable while running it. Because each major OS has its own object file format, they each have their own tools for these two checks. Note that statically linked libraries won’t show up here, since their object code has already been linked in and thus we’re not able to differentiate between object code that came from our first party code vs third party static libraries.

On OSX, we can use otool -L <bin> to check which .dylibs will get pulled in.

1
2
3
4
$ otool -L a.out
a.out:
           libhello.dylib (compatibility version 0.0.0, current version 0.0.0)
           /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)

So we can see that a.out depends on libhello.dylib (and expects to find it in the same directory as a.out). It also depends on shared library called libSystem.B.dylib. If you run otool -L on libSystem itself, you’ll see it depends on a bunch of other libraries including a C runtime, malloc implementation, pthreads implementation, and more. Let’s say you want to find the final resting place of where a symbol is defined, without digging with nm and otool, you can fire up your trusty debugger and ask it.

1
2
3
4
5
$ lldb a.out
...
(lldb) image lookup -r -s puts
...
        Summary: libsystem_c.dylib`puts        Address: libsystem_c.dylib[0x0000000000085c30] (libsystem_c.dylib.__TEXT.__stubs + 3216)

You’ll see a lot of output since puts is treated as a regex. You’re looking for the Summary line that has an address and is not a symbol stub. You can then check your work with otool and nm.

If we want to observe the dynamic linker in action on OSX, we can use dtruss:

1
2
3
4
5
6
7
8
9
10
11
$ sudo dtruss ./a.out
...
stat64("libhello.dylib\0", 0x7FFF50CEAC68, 0x1)         = 0 0
open("libhello.dylib\0", 0x0, 0x0)              = 3 0
...
mmap(0x10EF27000, 0x1000, 0x5, 0x12, 0x3, 0x0)          = 0x10EF27000 0
mmap(0x10EF28000, 0x1000, 0x3, 0x12, 0x3, 0x1000)               = 0x10EF28000 0
mmap(0x10EF29000, 0xC0, 0x1, 0x12, 0x3, 0x2000)         = 0x10EF29000 0
...
close(0x3)              = 0 0
...

On Linux, we can simply use ldd or readelf -d to query an executable for a list of its dynamic libraries.

1
2
3
4
5
6
7
8
9
10
11
12
13
$ clang -shared -fpic x.c y.c -o libhello.so
$ clang main.c libhello.so
$ ldd a.out
           linux-vdso.so.1 =>  (0x00007fff95d43000)
           libhello.so => not found
           libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcc98c5f000)
           /lib64/ld-linux-x86-64.so.2 (0x0000555993852000)
$ readelf -d a.out
Dynamic section at offset 0xe18 contains 25 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libhello.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
...

We can then use strace to observe the dynamic linker in action on Linux:

1
2
3
4
5
6
7
8
$ LD_LIBRARY_PATH=. strace ./a.out
...
open("./libhello.so", O_RDONLY|O_CLOEXEC) = 3
...
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\5\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=8216, ...}) = 0
close(3)                                = 0
...

What’s this LD_LIBRARY_PATH thing? That’s shell syntax for setting an environmental variable just for the duration of that command (as opposed to exporting it so it stays set for multiple commands). As opposed to OSX’s dynamic linker, which was happy to look in the cwd for libhello.dylib, on Linux we must supply the cwd if the dynamic library we want to link in is not in the standard search path.

But what is the standard search path? Well, there’s another environmental variable we can set to see this, LD_DEBUG. For example, on OSX:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ LD_DEBUG=libs LD_LIBRARY_PATH=. ./a.out
     15828:        find library=libhello.so [0]; searching
     15828:         search path=./tls/x86_64:./tls:./x86_64:.             (LD_LIBRARY_PATH)
     15828:          trying file=./tls/x86_64/libhello.so
     15828:          trying file=./tls/libhello.so
     15828:          trying file=./x86_64/libhello.so
     15828:          trying file=./libhello.so
     15828:
     15828:        find library=libc.so.6 [0]; searching
     15828:         search path=./tls/x86_64:./tls:./x86_64:.             (LD_LIBRARY_PATH)
     15828:          trying file=./tls/x86_64/libc.so.6
     1earc:          trying file=./tls/libc.so.6
     15828:          trying file=./x86_64/libc.so.6
     15828:          trying file=./libc.so.6
     15828:         search cache=/etc/ld.so.cache
     15828:          trying file=/lib/x86_64-linux-gnu/libc.so.6
     15828:        calling init: /lib/x86_64-linux-gnu/libc.so.6
     15828:        calling init: ./libhello.so
     15828:        initialize program: ./a.out
     15828:        transferring control: ./a.out
x
y
     15828:        calling fini: ./a.out [0]
     15828:        calling fini: ./libhello.so [0]

LD_DEBUG is pretty useful. Try:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ LD_DEBUG=help ./a.out
Valid options for the LD_DEBUG environment variable are:


  libs        display library search paths
  reloc       display relocation processing
  files       display progress for input file
  symbols     display symbol table processing
  bindings    display information about symbol binding
  versions    display version dependencies
  scopes      display scope information
  all         all previous options combined
  statistics  display relocation statistics
  unused      determined unused DSOs
  help        display this help message and exit


To direct the debugging output into a file instead of standard output
a filename can be specified using the LD_DEBUG_OUTPUT environment variable.

For some cool stuff, I recommend checking out LD_DEBUG=symbols and LD_DEBUG=statistics.

Going back to LD_LIBRARY_PATH, usually libraries you create and want to reuse between projects go into /usr/local/lib and the headers into /usr/local/include. I think of the convention as:

1
2
3
4
5
6
7
8
9
$ tree -L 2 /usr/
/usr
├── bin # system installed binaries like nm, gcc
├── include # system installed headers like stdio.h
├── lib # system installed libraries, both static and dynamic
└── local
    ├── bin # user installed binaries like rustc
    ├── include # user installed headers
    └── lib # user installed

Unfortunately, it’s a loose convention that’s broken down over the years and things are scattered all over the place. You can also run into dependency and versioning issues, that I don’t want to get into here, by placing libraries here instead of keeping them in-tree or out-of-tree of the source code of a project. Just know when you see a library like libc.so.6 that the numeric suffix is a major version number that follows semantic versioning. For more information, you should read Michael Kerrisk’s excellent book The Linux Programming Interface. This post is based on his chapter’s 41 & 42 (but with more info on tooling and OSX).

If we were to place our libhello.so into /usr/local/lib (on Linux you need to then run sudo ldconfig) and move x.h and y.h to /usr/local/include, then we could then compile with:

1
$ clang main.c -lhello

Note that rather than give a full path to our library, we can use the -l flag followed by the name of our library with the lib prefix and .so suffix removed.

When working with shared libraries and external code, three flags I use pretty often:

1
2
3
* -l<libname to link, no lib prefix or file extension; ex: -lnanomsg to link libnanomsg.so>
* -L <path to search for lib if in non standard directory>
* -I <path to headers for that library, if in non standard directory>

For finding specific flags needed for compilation where dynamic linkage is required, a tool called pkg-config can be used for finding appropriate flags. I’ve had less than stellar experiences with the tool as it puts the onus on the library author to maintain the .pc files, and the user to have them installed in the right place that pkg-config looks. When they do exist and are installed properly, the tool works well:

1
2
3
4
$ sudo apt-get install libpng12-dev
$ pkg-config --libs --cflags libpng12
-I/usr/include/libpng12  -lpng12
$ clang program.c `!!`

Using another neat environmental variable, we can hook into the dynamic linkage process and inject our own shared libraries to be linked instead of the expected libraries. Let’s say libgood.so and libmalicous.so both define a symbol for a function (the same symbol name and function signature). We can get a binary that links in libgood.so’s function to instead call libmalicous.so’s version:

1
2
3
4
$ ./a.out
hello from libgood
$ LD_PRELOAD=./libmalicious.so ./a.out
hello from libmalicious

LD_PRELOAD is not available on OSX, instead you can use DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, and recompile the original executable with -flat_namespace. Having to recompile the original executable is less than ideal for hooking an existing binary, but I could not hook libc as in the previous libmalicious example. I would be interested to know if you can though!

Manually invoking the dynamic linker from our code, we can even man in the middle library calls (call our hooked function first, then invoke the original target). We’ll see more of this in the next post on using the dynamic linker.

As you can guess, readjusting the search paths for dynamic libraries is a security concern as it let’s good and bad folks change the expected execution paths. Guarding against the use of these env vars becomes a rabbit hole that gets pretty tricky to solve without the heavy handed use of statically linking dependencies.

In the the previous post, I alluded to undefined symbols like puts. puts is part of libc, which is probably the most shared dynamic library on most computing devices since most every program makes use of the C runtime. (I think of a “runtime” as implicit code that runs in your program that you didn’t necessarily write yourself. Usually a runtime is provided as a library that gets implicitly linked into your executable when you compile.) You can statically link against libc with the -static flag, on Linux at least (OSX makes this difficult, “Apple does not support statically linked binaries on Mac OS X”).

I’m not sure what the benefit would be to mixing static and dynamic linking, but after searching the search paths from LD_DEBUG=libs for shared versions of a library, if any static ones are found, they will get linked in.

There’s also an interesting form of library called a “virtual dynamic shared object” on Linux. I haven’t covered memory mapping yet, but know it exists, is usually hidden for libc, and that you can read more about it via man 7 vdso.

One thing I find interesting and don’t quite understand how to recreate is that somehow glibc on Linux is also executable:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Ubuntu GLIBC 2.24-3ubuntu1) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.2.0 20161005.
Available extensions:
    crypt add-on version 2.1 by Michael Glad and others
    GNU Libidn by Simon Josefsson
    Native POSIX Threads Library by Ulrich Drepper et al
    BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.

Also, note that linking against third party code has licensing implications (of course) of particular interest when it’s GPL or LGPL. Here is a good overview which I’d summarize as: code that statically links against LGPL code must also be LGPL, while any form of linkage against GPL code must be GPL’d.

Ok, that was a lot. In the previous post, we covered Object Files and Symbols. In this post we covered hacking around with static and dynamic linkage. In the next post, I hope to talk about manually invoking the dynamic linker at runtime.

Object Files and Symbols

| Comments

What was supposed to be one blog post about memory segmentation turned into what will be a series of posts. As the first in the series, we cover the extreme basics of object files and symbols. In follow up posts, I plan to talk about static libraries, dynamic libraries, dynamic linkage, memory segments, and finally memory usage accounting. I also cover command line tools for working with these notions, both in Linux and OSX.

A quick review of the compilation+execution pipeline (for terminology):

  1. Lexing produces tokens
  2. Parsing produces an abstract syntax tree
  3. Analysis produces a code flow graph
  4. Optimization produces a reduced code flow graph
  5. Code gen produces object code
  6. Linkage produces a complete executable
  7. Loader instructs the OS how to start running the executable

This series will focus on part #6.

Let’s say you have some amazing C/C++ code, but for separations of concerns, you want to start moving it out into separate source files. Whereas previously in one file you had:

1
2
3
4
5
6
7
8
// main.c
#include <stdio.h>
void helper () {
  puts("helper");
}
int main () {
  helper();
}

You now have two source files and maybe a header:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// main.c
#include "helper.h"
int main () {
  helper();
}

// helper.h
void helper();

//helper.c
#include <stdio.h>
#include "helper.h"
void helper () {
  puts("helper");
}

In the single source version, we would have compiled and linked that with clang main.c and had an executable file. In the multiple source version, we first compile our source files to object files, then link them altogether. That can be done separately:

1
2
3
$ clang -c helper.c     # produces helper.o
$ clang -c main.c       # produces main.o
$ clang main.o helper.o # produces a.out

We can also do the compilation and linkage in one step:

1
$ clang helper.c main.c # produces a.out

Nothing special thus far; C/C++ 101. In the first case of separate compilation and linkage steps, we were left with intermediate object files (.o). What exactly are these?

Object files are almost full executables. They contain machine code, but that code still requires a relocation step. It also contains metadata about the addresses of its variables and functions (called symbols) in an associative data structure called a symbol table. The addresses may not be the final address of the symbol in the final executable. They also contain some information for the loader and probably some other stuff.

Remember that if we fail to specify the helper object file, we’ll get an undefined symbol error.

1
2
3
4
5
6
$ clang main.c
Undefined symbols for architecture x86_64:
  "_helper", referenced from:
      _main in main-459dde.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

The problem is main.o refers to some symbol called helper, but on it’s own doesn’t contain any more information about it. Let’s say we want to know what symbols an object file contains, or expects to find elsewhere. Let’s introduce our first tool, nm. nm will print the name list or symbol table for a given object or executable file. On OSX, these are prefixed with an underscore.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ nm helper.o
0000000000000000 T _helper
                 U _puts

$ nm main.o
                 U _helper
0000000000000000 T _main

$ nm a.out
...
0000000100000f50 T _helper
0000000100000f70 T _main
                 U _puts
...

Let’s dissect what’s going on here. The output (as understood by man 1 nm) is a space separated list of address, type, and symbol name. We can see that the addresses are placeholders in object files, and final in executables. The name should make sense; it’s the name of the function or variable. While I’d love to get in depth on the various symbol types and talk about sections, I don’t think I could do as great a job as Peter Van Der Linden in his book “Expert C Programming: Deep C Secrets.”

For our case, we just care about whether the symbol in a given object file is defined or not. The type U (undefined) means that this symbol is referenced or used in this object code/executable, but it’s value wasn’t defined here. When we compiled main.c alone and got the undefined symbol error, it should now make sense why we got the undefined symbol error for helper. main.o contains a symbol for main, and references helper. helper.o contains a symbol for helper, and references to puts. The final executable contains symbols for main and helper and references to puts.

You might be wondering where puts comes from then, and why didn’t we get an undefined symbol error for puts like we did earlier for helper. The answer is the C runtime. libc is implicitly dynamically linked to all executables created by the C compiler. We’ll cover dynamic linkage in a later post in this series.

When the linker performs relocation on the object files, combining them into a final executable, it goes through placeholders of addresses and fills them in. We did this manually in our post on JIT compilers.

While nm gave us a look into our symbol table, two other tools I use frequently are objdump on Linux and otool on OSX. Both of these provide disassembled assembly instructions and their addresses. Note how the symbols for functions get translated into labels of the disassembled functions, and that their address points to the first instruction in that label. Since I’ve shown objdump numerous times in previous posts, here’s otool.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
$ otool -tV helper.o
helper.o:
(__TEXT,__text) section
_helper:
0000000000000000    pushq    %rbp
0000000000000001    movq    %rsp, %rbp
0000000000000004    subq    $0x10, %rsp
0000000000000008    leaq    0xe(%rip), %rdi         ## literal pool for: "helper"
000000000000000f    callq    _puts
0000000000000014    movl    %eax, -0x4(%rbp)
0000000000000017    addq    $0x10, %rsp
000000000000001b    popq    %rbp
000000000000001c    retq
$ otool -tV main.o
main.o:
(__TEXT,__text) section
_main:
0000000000000000    pushq    %rbp
0000000000000001    movq    %rsp, %rbp
0000000000000004    movb    $0x0, %al
0000000000000006    callq    _helper
000000000000000b    xorl    %eax, %eax
000000000000000d    popq    %rbp
000000000000000e    retq
$ otool -tV a.out
a.out:
(__TEXT,__text) section
_helper:
0000000100000f50    pushq    %rbp
0000000100000f51    movq    %rsp, %rbp
0000000100000f54    subq    $0x10, %rsp
0000000100000f58    leaq    0x43(%rip), %rdi        ## literal pool for: "helper"
0000000100000f5f    callq    0x100000f80             ## symbol stub for: _puts
0000000100000f64    movl    %eax, -0x4(%rbp)
0000000100000f67    addq    $0x10, %rsp
0000000100000f6b    popq    %rbp
0000000100000f6c    retq
0000000100000f6d    nop
0000000100000f6e    nop
0000000100000f6f    nop
_main:
0000000100000f70    pushq    %rbp
0000000100000f71    movq    %rsp, %rbp
0000000100000f74    movb    $0x0, %al
0000000100000f76    callq    _helper
0000000100000f7b    xorl    %eax, %eax
0000000100000f7d    popq    %rbp
0000000100000f7e    retq

readelf -s <object file> will give us a list of symbols on Linux. ELF is the file format used by the loader on Linux, while OSX uses Mach-O. Thus readelf and otool, respectively.

Also note that for static linkage, symbols need to be unique*, as they refer to memory locations to either read/write to in the case of variables or locations to jump to in the case of functions.

1
2
3
4
5
6
7
8
9
10
11
12
$ cat double_define.c
void a () {}
void a () {}
int main () {}
$ clang double_define.c
double_define.c:2:6: error: redefinition of 'a'
void a () {}
     ^
double_define.c:1:6: note: previous definition is here
void a () {}
     ^
1 error generated.

*: there’s a notion of weak symbols, and some special things for dynamic libraries we’ll see in a follow up post.

Languages like C++ that support function overloading (functions with the same name but different arguments, return types, namespaces, or class) must mangle their function names to make them unique.

Code like:

1
2
3
4
5
6
7
namespace util {
  class Widget {
    public:
      void doSomething (bool save);
      void doSomething (int n);
  };
}

Will produce symbols like:

1
2
3
4
5
$ clang class.cpp -std=c++11
$ nm a.out
0000000100000f70 T __ZN4util6Widget11doSomethingEb
0000000100000f60 T __ZN4util6Widget11doSomethingEi
...

Note: GNU nm on Linux distros will have a --demangle option:

1
2
3
4
5
$ nm --demangle a.out
...
00000000004006d0 T util::Widget::doSomething(bool)
00000000004006a0 T util::Widget::doSomething(int)
...

On OSX, we can pipe nm into c++filt:

1
2
3
4
$ nm a.out | c++filt
0000000100000f70 T util::Widget::doSomething(bool)
0000000100000f60 T util::Widget::doSomething(int)
...

Finally, if you don’t have an object file, but instead a backtrace that needs demangling, you can either invoke c++filt manually or use demangler.com.

Rust also mangles its function names. For FFI or interface with C functions, other languages usually have to look for or expose symbols in a manner suited to C, the lowest common denominator. C++ has extern "C" blocks and Rust has extern blocks.

We can use strip to remove symbols from a binary. This can slim down a binary at the cost of making stack traces unreadable. If you’re following along at home, try comparing the output from your disassembler and nm before and after running strip on the executable. Luckily, you can’t strip the symbols out of object files, otherwise they’d be useless as you’d no longer be able to link them.

If we compile with the -g flag, we can create a different kind of symbol; debug symbols. Depending on your compiler+host OS, you’ll get another file you can run through nm to see an entry per symbol. You’ll get more info by using dwarfdump on this file. Debug symbols will retain source information such as filename and line number for all symbols.

This post should have been a simple refresher of some of the basics of working with C code. Finding symbols to be placed into a final executable and relocating addresses are the main job of the linker, and will be the main theme of the posts in this series. Keep your eyes out for more in this series on memory segmentation.

Cross Compiling C/C++ for Android

| Comments

Let’s say you want to build a hello world command line application in C or C++ and run it on your Android phone. How would you go about it? It’s not super practical; apps visible and distributable to end users must use the framework (AFAIK), but for folks looking to get into developing on ARM it’s likely they have an ARM device in their pocket.

This post is for folks who typically invoke their compiler from the command line, either explicitly, from build scripts, or other forms of automation.

At work, when working on Android, we typically checkout the entire Android source code (which is huge), use lunch to configure a ton of environmental variables, then use Makefiles with lots of includes and special vars. We don’t want to spend the time and disk space checking out the Android source code just to have a working cross compiler. Luckily, the Android tools team has an excellent utility to grab a prebuilt cross compiler.

This assumes you’re building from a Linux host. Android is a distribution of Linux, which is much easier to target from a Linux host. At home, I’ll usually develop on my OSX machine, ssh’d into my headless Linux box. (iTerm2 and tmux both have exclusive features, but I currently prefer iTerm2.)

The first thing we want to do is fetch the Android NDK. Not the SDK, the NDK.

1
2
3
➜  ~ curl -O \
  http://dl.google.com/android/repository/android-ndk-r12b-linux-x86_64.zip
➜  ~ unzip android-ndk-r12b-linux-x86_64.zip

It would be helpful to install adb and fastboot, too. This might be different for your distro’s package manager. Better yet may be to just build from source.

1
➜  ~ sudo apt-get install android-tools-adb android-tools-fastboot

Now for you Android device that you want to target, you’ll want to know the ISA. Let’s say I want to target my Nexus 6P, which has an ARMv8-A ISA (the first 64b ARM ISA).

1
2
➜  ~ ./android-ndk-r12b/build/tools/make_standalone_toolchain.py --arch arm64 \
  --install-dir ~/arm

This will create a nice standalone bundle in ~/arm. It will contain our cross compiler, linker, headers, libs, and sysroot (crt.o and friends). Most Android devices are ARMv7-A, so you’d use --arch arm. See the other supported architectures for cross compiling under table 4.

You might also want to change your install-dir and possible add it to your $PATH, or set $CC and $CXX.

Now we can compile hello_world.c.

1
2
3
4
5
6
7
8
9
10
11
➜  ~ cat hello_world.c
#include <stdio.h>
int main () {
  puts("hello world");
}

➜  ~ ~/arm/bin/clang -pie hello_world.c
➜  ~ file a.out
a.out: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically
linked, interpreter /system/bin/linker64,
BuildID[sha1]=ecc46648cf2c873253b3b522c0d14b91cf17c70f, not stripped

Since Android Lollipop, Android has required that executables be linked as position independent (-pie) to help provide ASLR.

<install-dir>/bin/ also has shell scripts with more full featured names like aarch64-linux-android-clang if you prefer to have clearer named executables in your $PATH.

Connect your phone, enable remote debugging, and accept the prompt for remote debugging.

1
2
3
➜  ~ adb push a.out /data/local/tmp/.
➜  ~ adb shell "./data/local/tmp/a.out"
hello world

We’ll use this toolchain in a follow post to start writing some ARMv8-A assembly. Stay tuned.

Setting Up Mutt With Gmail on Ubuntu

| Comments

I was looking to set up the mutt email client on my Ubuntu box to go through my gmail account. Since it took me a couple of hours to figure out, and I’ll probably forget by the time I need to know again, I figure I’d post my steps here.

I’m on Ubuntu 16.04 LTS (lsb_release -a)

Install mutt:

1
$ sudo apt-get install mutt

In gmail, allow other apps to access gmail:

Allowing less secure apps to access your account Less Secure Apps

Create the folder

1
2
3
$ sudo touch $MAIL
$ sudo chmod 660 $MAIL
$ sudo chown `whoami`:mail $MAIL

where $MAIL for me was /var/mail/nick.

Create the ~/.muttrc file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
set realname = "<first and last name>"
set from = "<gmail username>@gmail.com"
set use_from = yes
set envelope_from = yes

set smtp_url = "smtps://<gmail username>@gmail.com@smtp.gmail.com:465/"
set smtp_pass = "<gmail password>"
set imap_user = "<gmail username>@gmail.com"
set imap_pass = "<gmail password>"
set folder = "imaps://imap.gmail.com:993"
set spoolfile = "+INBOX"
set ssl_force_tls = yes

# G to get mail
bind index G imap-fetch-mail
set editor = "vim"
set charset = "utf-8"
set record = ''

I’m sure there’s better/more config options. Feel free to go wild, this is by no means a comprehensive setup.

Run mutt:

1
$ mutt

We should see it connect and download our messages. m to start sending new messages. G to fetch new messages.

Data Models and Word Size

| Comments

This post is a follow up to my previous blog post about word size.

Three C/C++ programmers walk into a bar. One argues that sizeof(void*) is equivalent to sizeof(long), one argues that sizeof(void*) is equivalent to sizeof(int), and the third argues it’s sizeof(long long). Simultaneously, they’re all right, but they’re also all wrong (and need a lesson about portable C code). What the hell is going on?

One of the first few programs a programmer might write after hello world is something like this:

1
2
3
4
5
6
7
#include <stdio.h>
int main () {
  printf("sizeof(int): %zu\n", sizeof(int));
  printf("sizeof(long): %zu\n", sizeof(long));
  printf("sizeof(long long): %zu\n", sizeof(long long));
  printf("sizeof(void*): %zu\n", sizeof(void*));
}

Note the use of the %zu format specifier, a C99 addition that isn’t portable to older compilers! (This post is more about considerations when porting older code to newer machines, not about porting newer code to run on older machines. Not having a standards compliant C compiler makes writing more portable C code even trickier, and is a subject for another blog post).

When I run that code on my x86-64 OSX machine, I get the following output:

1
2
3
4
sizeof(int): 4
sizeof(long): 8
sizeof(long long): 8
sizeof(void*): 8

So it looks like I would be the first programmer in the story in the first paragraph, since on my machine, it looks like sizeof(long) == sizeof(void*). Also note how sizeof(long long) is equivalent as well.

But what would happen if I compiled my code on a 32 bit machine? Luckily, my processor has backwards compatibility with 32b binaries, so I can cross compile it locally and still run it. Ex:

1
2
3
4
5
6
7
8
9
10
11
➜  clang sizeof.c -Wall -Wextra -Wpedantic
➜  file a.out
a.out: Mach-O 64-bit executable x86_64
➜  clang sizeof.c -Wall -Wextra -Wpedantic -m32
➜  file a.out
a.out: Mach-O executable i386
➜  ./a.out
sizeof(int): 4
sizeof(long): 4
sizeof(long long): 8
sizeof(void*): 4

Huh, suddenly sizeof(void*) == sizeof(int) == sizeof(long)! This seems to be the case of the second programmer from the story.

Both programmer 1 and programmer 2 might agree that the size of a pointer is equivalent to their machine’s respective word size, but that too would be an incorrect assumption for portable C/C++ code!

Programmer 3 goes through the hellscape that is installing a working compiler for Windows and building a 64b command line application (to be fair, installing command line tools for OSX is worse; installing a compiler for most OS’ leaves much to be desired). When they run that program, they see:

1
2
3
4
sizeof(int): 4
sizeof(long): 4
sizeof(long long): 8
sizeof(void*): 8

This is yet a third case (the third programmer from the story). In this case, only sizeof(long long) is equivalent to sizeof(void*).

Data Models

What these programmers are seeing is known as data models. Programmer 1 one on a 64b x86-64 OSX machine had an LP64 data model where longs (L), (larger long longs,) and pointers (P) are 64b, but ints were 32b. Programmer 2 on a 32b x86 OSX machine had an ILP32 data model where ints (I), longs (L), and pointers (P) were 32b, but long longs were 64b. Programmer 3 on a 64b x86-64 Windows machine had a LLP64 data model, where only long longs (LL) and pointers (P) were 64b, ints and longs being 32b.

Data model sizeof(int) sizeof(long) sizeof(long long) sizeof(void*) example
ILP32 32b 32b 64b 32b Win32, i386 OSX & Linux
LP64 32b 64b 64b 64b x86-64 OSX & Linux
LLP64 32b 32b 64b 64b Win64

There are older data models such as LP32 (Windows 3.1, Macintosh, where ints are 16b), and more exotic ones like ILP64, and SILP64. Knowing the data model thus is important for portable C/C++ code.

Historical Perspective

Running out of address space is and will continue to be tradition in computing. Applications become bigger as computer power and memory gets cheaper. Companies want to sell chips that have larger word sizes to address more memory, but early adopters don’t want to buy a computer where there favorite application hasn’t been compiled and thus doesn’t exist on yet. Someone from the back shouts virtual machines then ducks as a chair is thrown.

This document highlights some reasons why LP64 is preferred to ILP64: ILP64 made portable C code that only needed 32b of precision harder to maintain (on ILP64 an int was 64b, but a short was 16b!). It mentions how for data structures that did not contain pointers, their size would be the same on LLP64 as ILP32, which is the direction Microsoft went. LLP64 was essentially the ILP32 model with 64b pointers.

Linux also supports an ABI called x32 which can use x86-64 ISA improvements but uses 32b pointers to reduce the size of data structures that would otherwise have 64b pointers.

For a great historical perspective on the evolution of word size and data models, as well as the “toil and trouble” caused, this paper was an excellent reference. It describes Microsoft finally abandoning support for 16b data models in Windows XP 64. It mentions that the industry was pretty split between LP64, LLP64, and ILP64 as porting code from the good old days of ILP32 would break in different ways. That the use of long was more prevalent in Windows applications vs the use of int in unix applications. It also makes the key point that a lot of programmers from the ILP32 era made assumptions that sizeof(int) == sizeof(long) == sizeof(void*) which would not hold true for the LP64/LLP64 era.

One important point the paper makes makes that’s easily missed is that typedef wasn’t added to C until 1977 when hardware manufactures still couldn’t agree on how many bits were in a char (CHAR_BITS) and some machines were using 24b addressing schemes. stdint.h and inttypes.h did not exist yet.

This article talks about two main categories of effects of switching from ILP32 to LP64 and has excellent examples of problematic code. That section near the end is worth the read alone and makes excellent points to look for during code review.

Conclusion

Word size or ISA doesn’t tell you anything about sizeof(int), sizeof(long), or sizeof(long long). We also saw that one machine can support multiple different data models (when I compiled and ran the same code with the -m32 flag).

The C standard tells you minimum guaranteed sizes for these types, but the data model (part of the ABI, external to but abiding by the C standard) is what tells you about the specifics sizes of standard integers and pointers.

Further Reading

What’s in a Word?

| Comments

Recently, there some was some confusion between myself and a coworker over the definition of a “word.” I’m currently working on a blog post about data alignment and figured it would be good to clarify some things now, that we can refer to later.

Having studied computer engineering and being quite fond of processor design, when I think of a “word,” I think of the number of bits wide a processor’s general purpose registers are (aka word size). This places hard requirements on the largest representable number and address space. A 64 bit processor can represent 264-1 (1.8x1019) as the largest unsigned long integer, and address up to 264-1 (16 EiB) different addresses in memory.

Further, word size limits the possible combinations of operations the processor can perform, length of immediate values used, inflates the size of binary files and memory needed to store pointers, and puts pressure on instruction caches.

Word size also has implications on loads and stores based on alignment, as we’ll see in a follow up post.

When I think of 8 bit computers, I think of my first microcontroller: an Arduino with an Atmel AVR processor. When I think of 16 bit computers, I think of my first game console, a Super Nintendo with a Ricoh 5A22. When I think of 32 bit computers, I think of my first desktop with Intel’s Pentium III. And when I think of 64 bit computers, I think modern smartphones with ARMv8 instruction sets. When someone mentions a particular word size, what are the machines that come to mind for you?

So to me, when someone’s talking about a 64b processor, to that machine (and me) a word is 64b. When we’re referring to a 8b processor, a word is 8b.

Now, some confusion.

Back in my previous blog posts about x86-64 assembly, JITs, or debugging, you might have seen me use instructions that have suffixes of b for byte (8b), w for word (16b), dw for double word (32b), and qw for quad word (64b) (since SSE2 there’s also double quadwords of 128b).

Wait a minute! How suddenly does a “word” refer to 16b on a 64b processor, as opposed to a 64b “word?”

In short, historical baggage. Intel’s first hit processor was the 4004, a 4b processor released in 1971. It wasn’t until 1979 that Intel created the 16b 8086 processor.

The 8086 was created to compete with other 16b processors that beat it to the market, like the Zilog Z80 (any Gameboy emulator fans out there? Yes, I know about the Sharp LR35902). The 8086 was the first design in the x86 family, and it allowed for the same assembly syntax from the earlier 8008, 8080, and 8085 to be reassembled for it. The 8086’s little brother (8088) would be used in IBM’s PC, and the rest is history. x86 would become one of the most successful ISAs in history.

For backwards compatibility, it seems that both Microsoft’s (whose success has tracked that of x86 since MS-DOS and IBM’s PC) and Intel’s documentation refers to words still as being 16b. This allowed 16b PE32+ executables to be run on 32b or even 64b newer versions of Windows, without requiring recompilation of source or source code modification.

This isn’t necessarily wrong to refer to a word based on backwards compatibility, it’s just important to understand the context in which the term “word” is being used, and that there might be some confusion if you have a background with x86 assembly, Windows API programming, or processor design.

So the next time someone asks: why does Intel’s documentation commonly refer to a “word” as 16b, you can tell them that the x86 and x86-64 ISAs have maintained the notion of a word being 16b since the first x86 processor, the 8086, which was a 16b processor.

Side Note: for an excellent historical perspective programming early x86 chips, I recommend Michael Abrash’s Graphics Programming Black Book. For instance he talks about 8086’s little brother, the 8088, being a 16b chip but only having an 8b bus with which to access memory. This caused a mysterious “cycle eater” to prevent fast access to 16b variables, though they were the processor’s natural size. Michael also alludes to alignment issues we’ll see in a follow up post.

Intro to Debugging X86-64 Assembly

| Comments

I’m hacking on an assembly project, and wanted to document some of the tricks I was using for figuring out what was going on. This post might seem a little basic for folks who spend all day heads down in gdb or who do this stuff professionally, but I just wanted to share a quick intro to some tools that others may find useful. (oh god, I’m doing it)

If your coming from gdb to lldb, there’s a few differences in commands. LLDB has great documentation on some of the differences. Everything in this post about LLDB is pretty much there.

The bread and butter commands when working with gdb or lldb are:

  • r (run the program)
  • s (step in)
  • n (step over)
  • finish (step out)
  • c (continue)
  • q (quit the program)

You can hit enter if you want to run the last command again, which is really useful if you want to keep stepping over statements repeatedly.

I’ve been using LLDB on OSX. Let’s say I want to debug a program I can build, but is crashing or something:

1
$ sudo lldb ./asmttpd web_root

Setting a breakpoint on jump to label:

1
2
(lldb) b sys_write
Breakpoint 3: where = asmttpd`sys_write, address = 0x00000000000029ae

Running the program until breakpoint hit:

1
2
3
4
5
6
7
8
9
10
(lldb) r
Process 32236 launched: './asmttpd' (x86_64)
Process 32236 stopped
* thread #1: tid = 0xe69b9, 0x00000000000029ae asmttpd`sys_write, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
    frame #0: 0x00000000000029ae asmttpd`sys_write
asmttpd`sys_write:
->  0x29ae <+0>: pushq  %rdi
    0x29af <+1>: pushq  %rsi
    0x29b0 <+2>: pushq  %rdx
    0x29b1 <+3>: pushq  %r10

Seeing more of the current stack frame:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(lldb) d
asmttpd`sys_write:
->  0x29ae <+0>:  pushq  %rdi
    0x29af <+1>:  pushq  %rsi
    0x29b0 <+2>:  pushq  %rdx
    0x29b1 <+3>:  pushq  %r10
    0x29b3 <+5>:  pushq  %r8
    0x29b5 <+7>:  pushq  %r9
    0x29b7 <+9>:  pushq  %rbx
    0x29b8 <+10>: pushq  %rcx
    0x29b9 <+11>: movq   %rsi, %rdx
    0x29bc <+14>: movq   %rdi, %rsi
    0x29bf <+17>: movq   $0x1, %rdi
    0x29c6 <+24>: movq   $0x2000004, %rax
    0x29cd <+31>: syscall
    0x29cf <+33>: popq   %rcx
    0x29d0 <+34>: popq   %rbx
    0x29d1 <+35>: popq   %r9
    0x29d3 <+37>: popq   %r8
    0x29 <+39>: popq   %r10
    0x29d7 <+41>: popq   %rdx
    0x29d8 <+42>: popq   %rsi
    0x29d9 <+43>: popq   %rdi
    0x29da <+44>: retq

Getting a back trace (call stack):

1
2
3
4
5
6
7
(lldb) bt
* thread #1: tid = 0xe69b9, 0x00000000000029ae asmttpd`sys_write, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
  * frame #0: 0x00000000000029ae asmttpd`sys_write
    frame #1: 0x00000000000021b6 asmttpd`print_line + 16
    frame #2: 0x0000000000002ab3 asmttpd`start + 35
    frame #3: 0x00007fff9900c5ad libdyld.dylib`start + 1
    frame #4: 0x00007fff9900c5ad libdyld.dylib`start + 1

peeking at the upper stack frame:

1
2
3
4
5
6
7
(lldb) up
frame #1: 0x00000000000021b6 asmttpd`print_line + 16
asmttpd`print_line:
    0x21b6 <+16>: movabsq $0x30cb, %rdi
    0x21c0 <+26>: movq   $0x1, %rsi
    0x21c7 <+33>: callq  0x29ae                    ; sys_write
    0x21cc <+38>: popq   %rcx

back down to the breakpoint-halted stack frame:

1
2
3
4
5
6
7
(lldb) down
frame #0: 0x00000000000029ae asmttpd`sys_write
asmttpd`sys_write:
->  0x29ae <+0>: pushq  %rdi
    0x29af <+1>: pushq  %rsi
    0x29b0 <+2>: pushq  %rdx
    0x29b1 <+3>: pushq  %r10

dumping the values of registers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(lldb) register read
General Purpose Registers:
       rax = 0x0000000000002a90  asmttpd`start
       rbx = 0x0000000000000000
       rcx = 0x00007fff5fbffaf8
       rdx = 0x00007fff5fbffa40
       rdi = 0x00000000000030cc  start_text
       rsi = 0x000000000000000f
       rbp = 0x00007fff5fbffa18
       rsp = 0x00007fff5fbff9b8
        r8 = 0x0000000000000000
        r9 = 0x00007fff7b1670c8  atexit_mutex + 24
       r10 = 0x00000000ffffffff
       r11 = 0xffffffff00000000
       r12 = 0x0000000000000000
       r13 = 0x0000000000000000
       r14 = 0x0000000000000000
       r15 = 0x0000000000000000
       rip = 0x00000000000029ae  asmttpd`sys_write
    rflags = 0x0000000000000246
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

read just one register:

1
2
(lldb) register read rdi
     rdi = 0x00000000000030cc  start_text

When you’re trying to figure out what system calls are made by some C code, using dtruss is very helpful. dtruss is available on OSX and seems to be some kind of wrapper around DTrace.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ cat sleep.c
#include <time.h>
int main () {
  struct timespec rqtp = {
    2,
    0
  };

  nanosleep(&rqtp, NULL);
}

$ clang sleep.c

$ sudo dtruss ./a.out
...all kinds of fun stuff
__semwait_signal(0xB03, 0x0, 0x1)    = -1 Err#60

If you compile with -g to emit debug symbols, you can use lldb’s disassemble command to get the equivalent assembly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ clang sleep.c -g
$ lldb a.out
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) b main
Breakpoint 1: where = a.out`main + 16 at sleep.c:3, address = 0x0000000100000f40
(lldb) r
Process 33213 launched: '/Users/Nicholas/code/assembly/asmttpd/a.out' (x86_64)
Process 33213 stopped
* thread #1: tid = 0xeca04, 0x0000000100000f40 a.out`main + 16 at sleep.c:3, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000f40 a.out`main + 16 at sleep.c:3
   1    #include <time.h>
   2    int main () {
-> 3      struct timespec rqtp = {
   4        2,
   5        0
   6      };
   7
(lldb) disassemble
a.out`main:
    0x100000f30 <+0>:  pushq  %rbp
    0x100000f31 <+1>:  movq   %rsp, %rbp
    0x100000f34 <+4>:  subq   $0x20, %rsp
    0x100000f38 <+8>:  leaq   -0x10(%rbp), %rdi
    0x100000f3c <+12>: xorl   %eax, %eax
    0x100000f3e <+14>: movl   %eax, %esi
->  0x100000f40 <+16>: movq   0x49(%rip), %rcx
    0x100000f47 <+23>: movq   %rcx, -0x10(%rbp)
    0x100000f4b <+27>: movq   0x46(%rip), %rcx
    0x100000f52 <+34>: movq   %rcx, -0x8(%rbp)
    0x100000f56 <+38>: callq  0x100000f68               ; symbol stub for: nanosleep
    0x100000f5b <+43>: xorl   %edx, %edx
    0x100000f5d <+45>: movl   %eax, -0x14(%rbp)
    0x100000f60 <+48>: movl   %edx, %eax
    0x100000f62 <+50>: addq   $0x20, %rsp
    0x100000f66 <+54>: popq   %rbp
    0x100000f67 <+55>: retq

Anyways, I’ve been learning some interesting things about OSX that I’ll be sharing soon. If you’d like to learn more about x86-64 assembly programming, you should read my other posts about writing x86-64 and a toy JIT for Brainfuck (the creator of Brainfuck liked it).

I should also do a post on Mozilla’s rr, because it can do amazing things like step backwards. Another day…