This is the second post in a series on memory segmentation. It covers working with static and dynamic libraries in Linux and OSX. Make sure to check out the first on object files and symbols.
Let’s say we wanted to reuse some of the code from our previous project in our next one. We could continue to copy around object files, but let’s say we have a bunch and it’s hard to keep track of all of them. Let’s combine multiple object files into an archive or static library. Similar to a more conventional zip file or “compressed archive,” our static library will be an uncompressed archive.
We can use the
ar command to create and manipulate a static archive.
-r flag will create the archive named
libhello.a and add the files
y.o to its index. I like to add the
-v flag for verbose output.
Then we can use the familiar
nm tool I introduced in the
to examine the content of the archives and their symbols.
1 2 3 4 5 6 7 8 9 10 11
Some other useful flags for
-d to delete an object file, ex.
libhello.a y.o and
-u to update existing members of the archive when their
source and object files are updated. Not only can we run
nm on our archive,
objdump both work.
Now that we have our static library, we can statically link it to our program
and see the resulting symbols. The
.a suffix is typical on both OSX and
Linux for archive files.
1 2 3 4 5 6
Our compiler understands how to index into archive files and pull out the functions it needs to combine into the final executable. If we use a static library to statically link all functions required, we can have one binary with no dependencies. This can make deployment of binaries simple, but also greatly increase their size. Upgrading large binaries incrementally becomes more costly in terms of space.
While static libraries allowed us to reuse source code, static linkage does not allow us to reuse memory for executable code between different processes. I really want to put off talking about memory benefits until the next post, but know that the solution to this problem lies in “dynamic libraries.”
While having a single binary file keeps things simple, it can really hamper memory sharing and incremental relinking. For example, if you have multiple executables that are all built with the same static library, unless your OS is really smart about copy-on-write page sharing, then you’re likely loading multiple copies of the same exact code into memory! What a waste! Also, when you want to rebuild or update your binary, you spend time performing relocation again and again with static libraries. What if we could set aside object files that we could share amongst multiple instances of the same or even different processes, and perform relocation at runtime?
The solution is known as dynamic libraries. If static libraries and static linkage were Atari controllers, dynamic libraries and dynamic linkage are Steel Battalion controllers. We’ll show how to work with them in the rest of this post, but I’ll prove how memory is saved in a later post.
Let’s say we want to created a shared version of libhello. Dynamic libraries typically have different suffixes per OS since each OS has it’s preferred object file format. On Linux the .so suffix is common, .dylib on OSX, and .dll on Windows.
1 2 3 4 5 6 7
-shared flag tells the linker to create a special file called a shared
-fpic option converts absolute addresses to relative addresses,
which allows for different processes to load the library at different virtual
addresses and share memory.
Now that we have our shared library, let’s dynamically link it into our executable.
1 2 3 4
The dynamic linker essential produces an incomplete binary. You can verify
nm. At runtime, we’ll delay start up to perform some memory mapping
early on in the process start (performed by the dynamic linker) and pay slight
costs for trampolining into position independent code.
Let’s say we want to know what dynamic libraries a binary is using. You can either query the executable (most executable object file formats contain a header the dynamic linker will parse and pull in libs) or observe the executable while running it. Because each major OS has its own object file format, they each have their own tools for these two checks. Note that statically linked libraries won’t show up here, since their object code has already been linked in and thus we’re not able to differentiate between object code that came from our first party code vs third party static libraries.
On OSX, we can use
otool -L <bin> to check which .dylibs will get pulled in.
1 2 3 4
So we can see that
a.out depends on
libhello.dylib (and expects to find it
in the same directory as
a.out). It also depends on shared library called
libSystem.B.dylib. If you run
otool -L on libSystem itself, you’ll see it
depends on a bunch of other libraries including a C runtime, malloc
implementation, pthreads implementation, and more. Let’s say you want to find
the final resting place of where a symbol is defined, without digging with
otool, you can fire up your trusty debugger and ask it.
1 2 3 4 5
You’ll see a lot of output since
puts is treated as a regex. You’re looking
for the Summary line that has an address and is not a symbol stub. You can
then check your work with
If we want to observe the dynamic linker in action on OSX, we can use
1 2 3 4 5 6 7 8 9 10 11
On Linux, we can simply use
readelf -d to query an executable for a
list of its dynamic libraries.
1 2 3 4 5 6 7 8 9 10 11 12 13
We can then use
strace to observe the dynamic linker in action on Linux:
1 2 3 4 5 6 7 8
LD_LIBRARY_PATH thing? That’s shell syntax for setting an
environmental variable just for the duration of that command (as opposed to
exporting it so it stays set for multiple commands). As opposed to OSX’s
dynamic linker, which was happy to look in the cwd for libhello.dylib, on Linux
we must supply the cwd if the dynamic library we want to link in is not in the
standard search path.
But what is the standard search path? Well, there’s another environmental
variable we can set to see this,
LD_DEBUG. For example, on OSX:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
LD_DEBUG is pretty useful. Try:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
For some cool stuff, I recommend checking out
Going back to
LD_LIBRARY_PATH, usually libraries you create and want to reuse
between projects go into /usr/local/lib and the headers into
/usr/local/include. I think of the convention as:
1 2 3 4 5 6 7 8 9
Unfortunately, it’s a loose convention that’s broken down over the years and
things are scattered all over the place. You can also run into dependency and
versioning issues, that I don’t want to get into here, by placing libraries
here instead of keeping them in-tree or out-of-tree of the source code of a
project. Just know when you see a library like
libc.so.6 that the numeric
suffix is a major version number that follows semantic versioning. For more
information, you should read Michael Kerrisk’s excellent book The Linux
Programming Interface. This post is based on his chapter’s 41 & 42 (but with
more info on tooling and OSX).
If we were to place our libhello.so into /usr/local/lib (on Linux you need to
sudo ldconfig) and move x.h and y.h to /usr/local/include, then we
could then compile with:
Note that rather than give a full path to our library, we can use the
followed by the name of our library with the lib prefix and .so suffix removed.
When working with shared libraries and external code, three flags I use pretty often:
1 2 3
For finding specific flags needed for compilation where dynamic linkage is
required, a tool called
pkg-config can be used for finding appropriate flags.
I’ve had less than stellar experiences with the tool as it puts the onus on the
library author to maintain the .pc files, and the user to have them installed
in the right place that
pkg-config looks. When they do exist and are
installed properly, the tool works well:
1 2 3 4
Using another neat environmental variable, we can hook into the dynamic linkage process and inject our own shared libraries to be linked instead of the expected libraries. Let’s say libgood.so and libmalicous.so both define a symbol for a function (the same symbol name and function signature). We can get a binary that links in libgood.so’s function to instead call libmalicous.so’s version:
1 2 3 4
Manually invoking the dynamic linker from our code, we can even man in the middle library calls (call our hooked function first, then invoke the original target). We’ll see more of this in the next post on using the dynamic linker.
As you can guess, readjusting the search paths for dynamic libraries is a security concern as it let’s good and bad folks change the expected execution paths. Guarding against the use of these env vars becomes a rabbit hole that gets pretty tricky to solve without the heavy handed use of statically linking dependencies.
In the the previous post, I alluded to undefined symbols like
is part of libc, which is probably the most shared dynamic library on most
computing devices since most every program makes use of the C runtime. (I
think of a “runtime” as implicit code that runs in your program that you didn’t
necessarily write yourself. Usually a runtime is provided as a library that
gets implicitly linked into your executable when you compile.) You can
statically link against libc with the
-static flag, on Linux at least (OSX
makes this difficult,
“Apple does not support statically linked binaries on Mac OS X”).
I’m not sure what the benefit would be to mixing static and dynamic linking, but after searching the search paths from LD_DEBUG=libs for shared versions of a library, if any static ones are found, they will get linked in.
There’s also an interesting form of library called a “virtual dynamic shared
object” on Linux. I haven’t covered memory mapping yet, but know it exists, is
usually hidden for libc, and that you can read more about it via
man 7 vdso.
One thing I find interesting and don’t quite understand how to recreate is that somehow glibc on Linux is also executable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Also, note that linking against third party code has licensing implications (of course) of particular interest when it’s GPL or LGPL. Here is a good overview which I’d summarize as: code that statically links against LGPL code must also be LGPL, while any form of linkage against GPL code must be GPL’d.
Ok, that was a lot. In the previous post, we covered Object Files and Symbols. In this post we covered hacking around with static and dynamic linkage. In the next post, I hope to talk about manually invoking the dynamic linker at runtime.