So I figured it would be a reasonably good idea to write out about my research into region based memory management (RBMM). Mainly, this post is targeted towards explaining how my gcc-plugin finds objects that are candidates for region'd memory. But before I start getting to far ahead of myself now, let me quickly introduce RBMM.
RBMM is a method of memory management that allows for the automatic maintenance of a programs dynamic memory usage at runtime. Through the use of static analysis at compile time, calls can be emitted into the code of the program being compiled. At runtime, these region calls (e.g. create region, allocate memory from region, delete region, etc) are executed. Some might term this as a form of compile-time garbage collection, and that is not entirely incorrect, however; RBMM can be faster than garbage collection. Most garbage collection methods require the execution of the program to be suspended (extending the total runtime of the application) so that pointers to dynamically allocated data are no longer in use can be free'd. In contrast, regions place dynamically allocated data into chunks of memory which have an identical lifetime. Therefore, freeing the memory associated to a region is really fast. Just kill the region and boom, all memory in that region can be reclaimed. This is much faster than visiting each object individually, and does not require the need to halt the program's execution. Consider a linked list. We could put the whole list into memory, allowing us to free all the nodes at once, when it is no longer needed, as well as possibly enhancing cache locality.
The approach I am taking is to study RBMM from the prospective of the gcc compiler. My RBMM implementation uses a gcc-plugin which analyzes the program being compiled and then emits the code.
The plugin operates on two primary passes, the first pass of the program analyzes the source code and generates data from which the second pass, the transformation pass, later uses to emit the code into the program which executes the region calls (create/alloc/delete). The first piece of the analysis pass looks for variables that are pointers, which could possibly be used for accessing dynamic/heap memory. The gcc compiler provides us a generic intermediate representation which is languages c, c++, ada, etc) agnostic. This intermediate representation is called GIMPLE. Further, my passes operate (analyze and transform) the GIMPLE code which is presented in a static single assignment format (SSA).
The first pass visits each statement in the program. In the case of an assignment statement, we look at what value is being assigned to. If that value is a pointer, our analysis takes note of it. Any variables used on the right-handed side of the assignment statement, which are used for calculating the left-handed side value have some association. This association we call a unification. So the left-hand side is unified with the variables in the right-hand side. But, as mentioned, we only care about pointers, so only the right-hand side pointers are associated with the left-hand side. Similar goes for the case if the left-hand side is being set from a memory allocation call (malloc/new). For example, assume we are analyizing the following statement: foo = somepointer; In this case we would add both 'foo' and 'somepointer' as a single unification piece. In the case of complex data structures, such as 'foo.bar = somepointer;' Our analysis would unify 'foo' and 'bar' followed by unifying that piece (foo and bar) with 'somepointer'.
(pssst... hey... if you read this far, send me an email or leave a really lame comment)
In the case that our analysis comes across a function call (a callee), except for memory allocation calls as mentioned above, we have to defer the analysis of that statement until our analysis (first-pass) has completed. Once that analysis is complete, we have unification data which is specific to each function, which we use as we visit each function call statement that we previously deferred. This portion of the analysis now becomes interprocedural, as it requires knowledge of other functions in the program which are being called. In other words, after we have intraprocedural data for each function, we can use that information interprocedurally to analyze function calls. If our analysis comes across a function call, we can query our stored analysis data and get the unification information about the callee (the thing being called). From this data we can tell if a function takes a pointer and if that function will need to have regions passed to it. Therefore, we can unify the arguments we
pass to the function being called, as well as any left-hand side values which are being set from the return value from the callee. This unification is based on the callee's data which we are guaranteed to have after the first pass has completed.
Once the unification process has completed, the transformation pass can rock. But ultimately that is a discussion for another post.
tl;dr Put things together