f() vs f(void) in C vs C++

TL;DR

Prefer f(void) in C to potentially save a 2B instruction per function call when targeting x86_64 as a micro-optimization. -Wstrict-prototypes can help. Doesn’t matter for C++.

The Problem

While messing around with some C code in godbolt Compiler Explorer, I kept noticing a particular funny case. It seemed with my small test cases that sometimes function calls would zero out the return register before calling a function that took no arguments, but other times not. Upon closer inspection, it seemed like a difference between function definitions, particularly f() vs f(void). For example, the following C code:

int foo();
int bar(void);

int baz() {
  foo();
  bar();
  return 0;
}

would generate the following assembly:

baz:
  pushq %rax # realign stack for callq
  xorl %eax, %eax # zero %al, non variadic
  callq foo
  callq bar # why you no zero %al?
  xorl %eax, %eax
  popq %rcx
  retq

In particular, focus on the call the foo vs the call to bar. foo is preceded with xorl %eax, %eax (X ^ X == 0, and is the shortest encoding for an instruction that zeroes a register on the variable length encoded x86_64, which is why its used a lot such as in setting the return value). (If you’re curious about the pushq/popq, see point #1.) Now I’ve seen zeroing before (see point #3 and remember that %al is the lowest byte of %eax and %rax), but if it was done for the call to foo, then why was it not additionally done for the call to bar? %eax being x86_64’s return register for the C ABI should be treated as call clobbered. So if you set it, then made a function call that may have clobbered it (and you can’t deduce otherwise), then wouldn’t you have to reset it to make an additional function call?

Let’s look at a few more cases and see if we can find the pattern. Let’s take a look at 2 sequential calls to foo vs 2 sequential calls to bar:

int foo();
int quux() {
  foo(); // notice %eax is always zeroed
  foo(); // notice %eax is always zeroed
  return 0;
}

quux:
  pushq %rax
  xorl %eax, %eax
  callq foo
  xorl %eax, %eax
  callq foo
  xorl %eax, %eax
  popq %rcx
  retq

int bar(void);
int quuz() {
  bar(); // notice %eax is not zeroed
  bar(); // notice %eax is not zeroed
  return 0;
}

quuz:
  pushq %rax
  callq bar
  callq bar
  xorl %eax, %eax
  popq %rcx
  retq

So it should be pretty clear now that the pattern is f(void) does not generate the xorl %eax, %eax, while f() does. What gives, aren’t they declaring f the same; a function that takes no parameters? Unfortunately, in C the answer is no, and C and C++ differ here.

An explanation

f() is not necessarily “f takes no arguments” but more of “I’m not telling you what arguments f takes (but it’s not variadic).” Consider this perfectly legal C and C++ code:

int foo();
int foo(int x) { return 42; }

It seems that C++ inherited this from C, but only in C++ does f() seems to have the semantics of “f takes no arguments,” as the previous examples all no longer have the xorl %eax, %eax. Same for f(void) in C or C++. That’s because foo() and foo(int) are two different function in C++ thanks to function overloading (thanks reddit user /u/OldWolf2). Also, it seems that C supported this difference for backwards compatibility w/ K & R C.

int bar(void);
int bar(int x) { return x + 42; }

Is an error in C, but in C++ thanks to function overloading these are two separate functions! (_Z3barv vs _Z3bari). (Thanks HN user pdpi, for helping me understand this. Cunningham’s Law ftw.)

Needless to say, If you write code like that where your function declarations and definitions do not match, you will be put in prison. Do not pass go, do not collect $200). Control flow integrity analysis is particularly sensitive to these cases, manifesting in runtime crashes.

What could a sufficiently smart compiler do to help?

-Wall and -Wextra will just flag the -Wunused-parameter. We need the help of -Wmissing-prototypes to flag the mismatch between declaration and definition. (An aside; I had a hard time remembering which was the declaration and which was the definition when learning C++. The mnemonic I came up with and still use today is: think of definition as in muscle definition; where the meat of the function is. Declarations are just hot air.) It’s not until we get to -Wstrict-prototypes do we get a warning that we should use f(void). -Wstrict-prototypes is kind of a stylistic warning, so that’s why it’s not part of -Wall or -Wextra. Stylistic warnings are in bikeshed territory (*cough* -Wparentheses *cough*).

One issue with C and C++’s style of code sharing and encapsulation via headers is that declarations often aren’t enough for the powerful analysis techniques of production optimizing compilers (whether or not a pointer “escapes” is a big one that comes to mind). Let’s see if a “sufficiently smart compiler” could notice when we’ve declared f(), but via observation of the definition of f() noticed that we really only needed the semantics of f(void).

int puts(const char*);
int __attribute__((noinline)) foo2() {
  puts("hello");
  return 0;
}

int quacks() {
  foo2();
  foo2();
  return 0;
}

quacks:
  pushq %rax
  callq foo2
  callq foo2
  xorl %eax, %eax
  popq %rcx
  retq

A ha! So by having the full definition of foo2 in this case in the same translation unit, Clang was able to deduce that foo2 didn’t actually need the semantics of f(), so it could skip the xorl %eax, %eax we’d seen for f() style definitions earlier. If we change foo2 to a declaration (such as would be the case if it was defined in an external translation unit, and its declaration included via header), then Clang can no longer observe whether foo2 definition differs or not from the declaration.

So Clang can potentially save you a single instruction (xorl %eax, %eax) whose encoding is only 2B, per function call to functions declared in the style f(), but only IF the definition is in the same translation unit and doesn’t differ from the declaration, and you happen to be targeting x86_64. *deflated whew* But usually it can’t because it’s only been provided the declaration via header.

Conclusion

I certainly think f() is prettier than f(void) (so C++ got this right), but pretty code may not always be the fastest and it’s not always straightforward when to prefer one over the other.

So it seems that f() is ok for strictly C++ code. For C or mixed C and C++, f(void) might be better.

References

https://stackoverflow.com/q/6212665/1027966
https://stackoverflow.com/q/693788/1027966
https://stackoverflow.com/q/12643202/1027966
https://stackoverflow.com/q/51032/1027966
https://stackoverflow.com/q/416345/1027966
https://stackoverflow.com/q/41803937/1027966
http://david.tribble.com/text/cdiffs.htm#C99-empty-parm
https://godbolt.org/z/nsgGpt
https://godbolt.org/z/IORbBb
https://godbolt.org/z/51s7K0
https://bugs.llvm.org/show_bug.cgi?id=41851 (the maintainer of Clang cites the relevant part of the C11 spec for this)

Nick Desaulniers

The enemy's gate is down