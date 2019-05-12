TL;DR
Prefer
f(void) in C to potentially save a 1B instruction per function call
when targeting x86_64 as a micro-optimization.
-Wstrict-prototypes can help.
Doesn’t matter for C++.
The Problem
While messing around with some C code in
godbolt Compiler Explorer,
I kept noticing a particular funny case. It seemed with my small test cases
that sometimes function calls would zero out the return register before calling
a function that took no arguments, but other times not. Upon closer
inspection, it seemed like a difference between function definitions,
particularly
f() vs
f(void). For example, the following C code:
1 2 3 4 5 6 7 8
would generate the following assembly:
1 2 3 4 5 6 7 8
In particular, focus on the call the
foo vs the call to
bar.
foo is
preceded with
xorl %eax, %eax (X ^ X == 0, and is the shortest encoding for
an instruction that zeroes a register on the variable length encoded x86_64,
which is why its used a lot such as in setting the return value). (If you’re
curious about the pushq/popq, see
point #1.)
Now I’ve seen zeroing before (see
point #3
and remember that
%al is the lowest byte of
%eax and
%rax), but if it was
done for the call to
foo, then why was it not additionally done for the call
to
bar?
%eax being x86_64’s return register for the C ABI should be treated
as call clobbered. So if you set it, then made a function call that may have
clobbered it (and you can’t deduce otherwise), then wouldn’t you have to reset
it to make an additional function call?
Let’s look at a few more cases and see if we can find the pattern. Let’s take
a look at 2 sequential calls to
foo vs 2 sequential calls to
bar:
1 2 3 4 5 6
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6
1 2 3 4 5 6 7
So it should be pretty clear now that the pattern is
f(void) does not
generate the
xorl %eax, %eax, while
f() does. What gives, aren’t they
declaring
f the same; a function that takes no parameters? Unfortunately, in
C the answer is no, and C and C++ differ here.
An explanation
f() is not necessarily “
f takes no arguments” but more of “I’m not telling
you what arguments
f takes (but it’s not variadic).” Consider this perfectly
legal C and C++ code:
1 2
It seems that C++ inherited this from C, but only in C++ does
f() seems to
have the semantics of “
f takes no arguments,” as the previous examples all no
longer have the
xorl %eax, %eax. Same for
f(void) in C or C++.
1 2
Is an error in C, but surprisingly C++ is less strict here, not only allowing it but also taking the semantics of the definition. (Spooky)
Needless to say, If you write code like that where your function declarations and definitions do not match, you will be put in prison. Do not pass go, do not collect $200). Control flow integrity analysis is particularly sensitive to these cases, manifesting in runtime crashes.
What could a sufficiently smart compiler do to help?
-Wall and
-Wextra will just flag the
-Wunused-parameter. We need the
help of
-Wmissing-prototypes to flag the mismatch between declaration and
definition. (An aside; I had a hard time remembering which was the declaration
and which was the definition when learning C++. The mnemonic I came up with
and still use today is: think of definition as in muscle definition; where the
meat of the function is. Declarations are just hot air.) It’s not until we
get to
-Wstrict-prototypes do we get a warning that we should use
f(void).
-Wstrict-prototypes is kind of a stylistic warning, so that’s why it’s not
part of
-Wall or
-Wextra. Stylistic warnings are in bikeshed territory
(*cough*
-Wparentheses *cough*).
One issue with C and C++’s style of code sharing and encapsulation via headers
is that declarations often aren’t enough for the powerful analysis techniques
of production optimizing compilers (whether or not a pointer
“escapes”
is a big one that comes to mind). Let’s see if a “sufficiently smart compiler”
could notice when we’ve declared
f(), but via observation of the definition
of
f() noticed that we really only needed the semantics of
f(void).
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7
A ha! So by having the full definition of
foo2 in this case in the same
translation unit, Clang was able to deduce that
foo2 didn’t actually need the
semantics of
f(), so it could skip the
xorl %eax, %eax we’d seen for
f()
style definitions earlier. If we change
foo2 to a declaration (such as would
be the case if it was defined in an
external translation unit, and its
declaration included via header), then Clang can no longer observe whether
foo2 definition differs or not from the declaration.
So Clang can potentially save you a single instruction (
xorl %eax, %eax)
whose encoding is only 1B, per function call to functions declared in the style
f(), but only IF the definition is in the same translation unit and doesn’t
differ from the declaration, and you happen to be targeting x86_64. *deflated
whew* But usually it can’t because it’s only been provided the declaration via
header.
Conclusion
I certainly think
f() is prettier than
f(void) (so C++ got this right),
but pretty code may not always be the fastest and it’s not always
straightforward when to prefer one over the other.
So it seems that
f() is ok for strictly C++ code. For C or mixed C and C++,
f(void) might be better.
