Tuesday, February 12, 2013

Picking up the Trash and Caller vs Callee Saved Registers

So there are a few posts around the internet about caller vs callee saved registers. Just check my references at the end of this post. Basically, I want to define in a concise way what the difference is and why I'm blabbering about stuff that most developers do not have to consider.

To me the difference between caller and callee saved registers matters. Why is that? Well, for one, I am writing a garbage collector and registers along with stack variables and global variables make up the root-set. The root-set is a set of variables that the garbage collector knows about and it is where the collector begins tracing. In short, for each variable in the root set, the collector follows any pointers that variable might have. For any variable that can be reached, starting from the root-set, that variable is said to be alive, and its memory cannot be reclaimed. Any variables not found through tracing can have their memory reclaimed. There it is, garbage collector 101, see you next semester.

As I mentioned, registers make up the root-set. Since the registers can contain pointers to data that was allocated, it is necessary to trace them at garbage collection time. If that allocated data can't be reached, the collector can recycle it and conserve memory resources. But an interesting thing occurs, and this is based on calling convention, how the compiler (or assembly-level programmer) constructs calls to functions. Some registers must be traced and some should be ignored. This, my friends, is the difference between caller and callee saved registers. The callee-saved registers must be traced. You will see why in just a jiffy!

These caller vs. callee registers are defined by the calling convention such as the Sys-V ABI for Intel Processors and/or another calling convention standard. For instance, the former is similar to the cdecl calling convention, and is the convention we will investigate here, as my work space is gcc and it constructs programs in a somewhat similar manner (similar enough to convey the message in this post). These conventions specify how certain registers are to be used. Since registers are global, all functions can see and access/modify them. So, it goes to say that specifying which registers a callee (subroutine) function can manipulate and which registers a caller must save is probably a good thing. As there are times where a caller wants to reclaim data it never stored on the stack, but left lying around in a register.

A caller-saved register is a register that a calling function must save before it calls any other function. As the other function, the callee, can use this fiddle with the register. These are "volatile" and if the data is not stored elsewhere before a function call, they could be lost. That is why these registers are typically stored on the stack.

A callee-saved register is a register that a callee must preserve if it wants to further use the register to store other information. A non-volatile register.

Given my environment (linux on a x86-64bit CPU and compile using gcc), and according to my references the caller and callee saved registers for such a configuration are as follows:
Caller-saved registers: rax, rcx, rdx
Callee-saved registers: rbp, rsp, rdi, rsi, rbx, r12, r13, r14, r15, rip

This is a pretty clear distinction, but a simple example goes a long way... here goes, in pseudo-assembly code. The function foo adds 123 to 456 and then calls bar. The result of calling bar is then added to 123 + 456:

                   # Comments:
foo:               # Function foo entry
    mov 123, rax   # rax = 123
    add rax, 456   # rax = 123 + 456
    push rax       # Caller saved
    call bar       # bar()
    pop rbx        # Restore the saved value (123 + 456 or 579)
    add rax, rbx   # rax = rax + rbx (whatever bar returned + 579)
The common calling convention here cdecl, which is similar to the SysV ABI for Intel and what gcc uses, mentions that return values are stored in the rax register. rax is a caller-saved register, it's volatile. If foo doesn't save it, bar might fiddle with it and change its value. While I don't show a body for bar, suppose it uses register r12 to do some of its math. r12 is a callee-saved register, meaning that the data in r12 is preserved by the callee. Suppose bar looks like:
    push r12      # Saving whatever is in r12
    mov 789, r12  # r12 = 789
    add r12, 42   # r12 = r12 + 42
    mov r12, rax  # rax = r12
    pop r12       # Restore r12
    return        # return rax
A keen eye will notice that r12 is being used in bar as a callee-saved register, as what the calling convention requires. bar is free to use r12 but it must restore it before it returns. This means that foo does not have to push anything to the stack, in fact foo could be written as follows:
foo:               # Function foo entry
    mov 123, rax   # rax = 123
    add rax, 456   # rax = 123 + 456
    mov rax, r12   # rax is caller saved
    call bar       # bar()
    mov r12, rbx   # Restore the saved value (123 + 456 or 579)
    add rax, rbx   # rax = rax + rbx (whatever bar returned + 579)
Instead of pushing to the stack, foo in this case knows that r12 is a callee-saved, non-volatile, register and makes use of that fact. Now, a keener eye might be saying, well... foo might be a callee of another function. Maybe main called foo well, yes, foo should follow convention and preserve r12.

This is the message for garbage collectors. The garbage collection function is called (probably from the memory allocator, when it notices that memory is low). When the collector function begins to execute, it knows of the callee-saved registers as the caller should have already stashed the data it cared about on its stack or in a callee-saved register.

A few ways to programmatically access the registers is via the setjmp, sigsetjmp, and/or ucontext glibc function calls. For instance, the sigsetjmp is one method that the Bohem collector uses to access registers for his garbage collector (libgc). If you were to look at the setjmp family you would learn that it returns only callee-saved registers, and not the full register set. Why is that? setjmp stores the environment into a structure. This includes just the callee-saved registers, since the caller should have preserved its caller-saved registers onto the onto the stack or in a callee-saved register. Consider the flow: main() --> functionA() --> functionB() --> setjmp(). setjmp would return the callee-saved registers and functionA can make use of these. As the registers values relevant for main, and functionA would be stored elsewhere (stack or callee-saved).

I am pretty sure this post is not free from flaws, but I hope it made sense, or that any flaws do not crush the integrity of the data in this post. Thanks for reading.


No comments:

Post a Comment