Fixing Function Pointers with decltype

December 2015

This post is about a particularly tricky bug I recently fixed. The issue was an application crash which appeared to violate C’s syntax rules. In the end, it turned out to be an issue with function pointer declarations in our app. In this post we’ll discuss the problem, what caused it, and how to use the C++11 decltype keyword to prevent this bug in your own codebase.

The Problem

The original issue was found in a long-running batch operation implemented in C++. The operation was crashing with an invalid pointer read near the very end. Typically, when you see an invalid pointer read, these are good places to start looking:

  • Was a pointer used before it was initialized?
  • Was a pointer used after it was freed?
  • Was the pointer freed twice?

We got the crash under a debugger, and it turned out to be none of the above. What we saw instead was much more strange.

Let’s say this was the crashing function:

 1 bool main()
 2 {
 3     mystruct *data = new mystruct;
 4     if (!data) return false;
 5 
 6     process(data);
 7     
 8     if (data->myvalue > 0) {
 9         // some additional processing
10     }
11 
12     delete data;
13     return true;
14 }

In the debugger, we were seeing the app crash at line 8, while dereferencing data->myvalue. It turns out data was nullptr at that point.

Okay, now, at this point you’re probably saying, “Yeah Dave, this is a super obvious bug, something inside process() clearly freed the pointer. Why are you blogging this?” No. Read again! Before the call to process(), the data variable stores a non-null pointer … but after process() returns, data points to null! Look at the code and tell me how that’s possible! :-)

We confirmed nobody’s using C++’s references feature to mess around with the pointer being passed in. process() just accepts a pointer to a mystruct using the language’s usual, default, pass-by-value semantics. The function’s signature looks like this:

void process(mystruct *data);

And yet, somehow, that call is nulling our pointer. That’s weird, right?

According to the language rules, there’s no way for process() to have changed what data points to. On line 4, we confirm that data is non-null; line 6 has no way to change what data points to, and yet by line 8, data has changed to null.

How can that be?

Stack Balancing

After some other angles didn’t pan out, we ended up dropping the debugger into assembly mode to trace the buggy pointer. We saw the pointer get initialized in register esi. During the call to process() from main(), we saw the compiler correctly generate push esi and pop esi instructions to store the value on the stack. Everything was looking fine so far.

Then we noticed something fishy: the stack pointer address was different for the initial push esi and eventual pop esi instructions. So we were trying to restore data from the call stack, but we popped from the wrong address; instead of restoring its actual value, we read from an adjacent region on the stack, which so happened to be 0. That’s why data magically changed into a null pointer when process() returned!

Based on the way the stack pointers weren’t lining up, we could conclude that someone had pushed more bytes to the stack than they had popped. This class of issue is called an unbalanced stack. That term was a new one for me; I suppose this type of problem isn’t very common nowadays :-)

So came the question: how did the stack get unbalanced?

Calling Conventions

To answer that question, we need to review the assembly-level convention for passing function parameters in C and C++. Specifically, when you call a function in C and C++ on an x86 processor, …

  • The caller pushes all arguments to the stack
  • The callee pops all arguments from the stack

Even though the caller pushes and the callee pops, as long as the number of arguments is the same on both ends, the stack ends up balanced. Your compiler uses these rules in most cases, with only a few exceptions:

  • When you use varargs, the caller has to clean up, because only the caller knows how many arguments it pushed to begin with.
  • You can explicitly tell the compiler to use other calling convetions, and it will generally comply.

Gone Awry

Once we had figured out we were dealing with an unbalanced stack, we still had some investigation left. Eventually, we found the problematic function call nested a half dozen function calls below process().

The function we were calling had been loaded using Windows’s LoadLibrary and GetProcAddress APIs, which allow you to manually load a DLL and extract a bare pointer to a specific function wherever the DLL’s code was loaded into memory. The code to load the DLL and extract the function address looked something like this:

typedef void (*MyFunction)(int arg1, int arg2);

// ...

HMODULE module = LoadLibrary("thedll.dll");
if (module) {
    MyFunction *theFunc = reinterpret_cast<MyFunction*>(
            GetProcAddress(module, "MyFunction")
        );

    if (theFunc != nullptr) {
        theFunc(1, 2);
    }
}

Someone had recently changed the function being called, and someone else had handled a merge conflict from that change. The merge conflict was done incorrectly, causing the typedef to fall out of sync with the real implementation. Compare the typedef above with the definition below:

void MyFunction(int arg)
{
    // ...
}

Remember how we said in the standard C calling convention, the caller pushes the arguments and the callee pops them? In this case, the caller and the callee had different opinions about the number of arguments to manage:

  • The caller thought the function took two arguments, and pushed 8 bytes
  • The callee thought it only took one argument, and popped 4 bytes
  • In the end, the stack pointer was 4 bytes from balanced

Unfortunately for us, the program had just happened to work for a while after we had unbalanced the stack. The program returned all the way to main() where we eventually failed, nowhere near the root cause.

Avoiding the Problem

This is not the kind of bug you want to find in a production system, or under any kind of time pressure. We figured out the stack was unbalanced pretty quickly, but from there we didn’t have a good way to figure out who had unbalanced the stack: we had to manually guess and test until we found the culprit.

Luckily, you can avoid these problems by doing a little engineering up front.

The trick is to get the compiler to catch a mismatch between the typedef and the function definition at compile time. This didn’t happen in the example above because the compiler never saw both the typedef and the definition at the same time. However, we can link the two together using a shared header and the decltype keyword, like so:

  • Declare the function in a common header shared by both the implementing DLL and the module calling LoadLibrary.
  • In the callee module, #include the header and implement the function
  • In the caller module, #include the header and declare the function pointer using decltype.

For example, we would first define MyFunction.h:

void MyFunction(int arg1, int arg2);

Then we could implement it in MyFunction.cpp:

#include "MyFunction.h"

void MyFunction(int arg1, int arg2)
{
    // ...
}

Finally, we could load that API in ExternalCaller.cpp like this:

#include "MyFunction.h"

typedef decltype(&MyFunction) MyFunctionPtr;

bool main()
{
    HMODULE module = LoadLibrary("thedll.dll");
    if (module) {
        auto func = reinterpret_cast<MyFunctionPtr>(
                GetProcAddress(module, "MyFunction")
            );

        if (func != nullptr) {
            func(1, 2);
            return true;
        }
    }

    return false;
}

If you haven’t seen it before, decltype is a new C++ keyword defined in C++11. It is a language construct which evaluates to the type of the expression in parentheses. In this case, &MyFunction evaluates to the function pointer signature for MyFunction, which means:

typedef decltype(&MyFunction) MyFunctionPtr;

evaluates to just what we’d expect:

typedef void (*MyFunctionPtr)(int arg1, int arg2);

This way we’ve linked the caller’s typedef with the callee’s implementation: both must match the declaration in the shared header. If we decide to change the implementation, all the necessary changes become compiler breaks, which are much easier to find.

Knowing this trick, hopefully you’ll never need to debug an issue like this yourself :-)