DavisDoesDownUnder: RBMM

Showing posts with label RBMM. Show all posts

Monday, December 31, 2012

3...2..1... Collect the Trash!

Ok so y'all probably remember wayyyyy back in October when I was discussing writing a garbage collector. Well, I really don't expect you to remember that. "You" being the pronoun referring to my two subscribers (one being a stalker). Ok, yeah I flatter myself, in reality, it's probably not a stalker of physical attraction, rather a psychotic blood thirsty individual. *waves*

Well, why do I want to dance with a garbage collector? Isn't that the evil-bastard-child/antithesis to region based memory management (RBMM), what I have been working on the past two years or so? Nope! Garbage collection can be a compliment. See, RBMM works by analyzing a program at compile-time (my system uses a gcc-plugin). From this analysis the compiler can insert memory allocation and reclaim calls (think malloc/free) at compile time. My system automatically free's memory, based on which objects point to which other objects. These groupings of objects can come from the same region of memory. A region cannot be reclaimed until all objects in it are no longer needed. Then, in one quick swoop, the memory of the region (the space that all the objects take-up) can be reclaimed. This fast memory reclaim is a key benefit of RBMM, as there is no need to visit each object and reclaim its memory individually.

Unfortunately, not all object lifetimes can be decided at compile time, such as global variables, or values pointed to/aliased by globals. This means that some objects cannot be reclaimed based on a compile-time static analysis. If these undecidable-lifetime objects are in a region, then that region lives for the entire lifetime of the program. Static analysis cannot force a region to die, unless it can be 100% sure that the region contents (objects) are never needed again. Global nastyness creates region bloat, as some objects might be reclaimable, and not others. As I previously mentioned, a region cannot be removed until ALL of its objects are no longer alive. If a global variable pops into the region, then that region now lives forever. Like a virus.

The above issue means that a garbage collector can actually benefit an RBMM system. That is what I have been working on: a combining of RBMM and garbage collection. This is not unique, and has been done before; however, the way we implement our collector is unique, but that is not discussed in this post.

Disclaimer: Grammar nuts please note that this is a blog post, so I am immediately relieved of any misuse of narrative modes. This is a design for my PhD and something that I have worked on with my advisers quite a bit (the "we"). The code is mine.

What I want to discuss in this post is how we locate stack variables. Stack variables form a subset of the root-set. The root-set being a collection of variables that are live at the time of garbage collection. If any data is accessible through the root-set then that object is also alive and must survive garbage collection. Else, the memory for data/objects that is not reachable can be collected. I will not go into details, or even the collector type I have chosen (e.g. mark-sweep, or copy, etc). Instead, I just want to talk about finding the stack variables; as I said a sub-set of the root set. Globals and registers also can be part of the root-set, but this post is only about stack variables </redundancy>.

When the garbage collector kicks-in the collector needs to start somewhere. It starts by scanning the root-set. Well, how does the collector get this root set? Specifically, what this post is concerned with is obtaining a sub-set of the roots, the stack variables. If you are familiar with stack-frames or activation records, the values there in the stack must be scanned. If any of those variables point to objects dynamically allocated (e.g. heap variables or variables from a region), then such objects must survive collection, and their memory cannot be recycled.

I have implemented a compiler modification (gcc-plugin) which provides hints to the garbage collector as to how to trace the stack of the program. This is how the collector gets the subset of the root-set, the stack variables. Actually, it is so cool and nifty I warranted this entire blog post to the notion.

My aim is to provide, at compile-time, a table that can be read by the garbage collector at runtime. During compile-time, as the source-code is being processed my plugin locates all function calls in the program. At the return of each function call a label is inserted (think assembly goto/label). This label is unique for each callsite, and makes up the key into a table that I am building. This key points to information about the function that the label is inside. For the example just below, the three labels are unique, but all point to the same function information. Information about my_function(), which specifies what local variables it has (x, y, z), if the variables are pointers, and how big the variables are.

void my_function(void)
{
    int x;
    float y;
    double z;

    foo();
__label0:

    bar();
__label1:

    baz();
__label2;
}

The function above has three callees (foo, bar, baz). At each call site, my plugin inserts a label. Now, why do I insert labels? Well, at runtime we need a way of associating the return point for each function call (foo, bar, baz). At compile time I do not have any addresses to use, hell, addresses are created by the OS when the executable is loaded; we haven't even finished compiling the source code yet! The solution is to insert a label which can be used at compile time to build a table of stack-variable information. The loader will fix-up the addresses, and at runtime the labels will have the correct address that the function call returns to. Recall that each time a function is called, the return instruction pointer RIP or return address will be stored on the stack. This value will be the same as the respected label. For instance, consider that my_function() is being executed, and the line which calls bar() is executed. When bar() gets called, the return point, which now happens to also match __label1, is placed onto the stack. And when bar() returns, the return point is now the same as __label1. The only thing we have done so far is insert labels. No big deal. Note: I doubt anyone except for maybe one person has read this far, if you do, email me, you kick so much ass, that next time we meet, I will buy you coffee produced from the most lavish and rare of coffee seeds: unicorn digested coffee. I will then serenade you with a ear-piercing song and proclaim you Captain Badassery and dance around bear-chested wearing a Burger-King crown and throw pennies at your feet.

As mentioned, the labels are entries into a table that contains function information (variables, stack size, etc). When the garbage collector kicks-in, it looks at the stack and locates the return point. That value is a key into a function table my plugin generated, remember the RIP coincides with the added labels. The collector uses the RIP/label as a key to get function information, and then processes the variables (figuring out which heap values can be accessed though the local variables in that function). Once that function is scanned, its caller function is scanned, and so on... it's turtles all the way down... until the collector hits main(). Then this portion of garbage collection is complete.

TLDR; Anyways, the above diatribe is an expose into the use of labels for providing a means of accessing stack variables, and providing a compile-time generated way of performing a runtime stack trace.

Have an awesome New Year and may all of your garbage be collected properly!

-Matt

Monday, July 2, 2012

Copy of my Research Work

Region based what? So the paper, "Towards Region-Based Memory Management for Go," that myself and advisers had worked on and accepted at the ACM sponsored Memory Systems Performance and Correctness (MSPC) workshop is available. Much of this was written by my advisors, heck, when you have three incredibly smart advisers working on a paper, it's hard to fight anyone. The result, I am quite proud of. Mainly, I care primarily about having fun, writing code, and learning. I think the work towards this paper achieved just that. In fact, the numbers are from my code. Of course, my advisers helped me design the system that reported on in the paper. So, I am somewhat pleased. Plus, this got me a trip to Beijing. The PDF copy of said paper is here. And the ACM Digital Library data is here. Pull out your e-reader and curl-up on the toilet, it is a pretty intense read.

-Matt

Wednesday, November 23, 2011

ANSI and RBMM

For some reason I really enjoy running the testsuite I have been putting together for my research into region based memory management (RBMM). Maybe it's the awesome color, maybe it's the satisfaction it is reporting 100% success rate. But either way, I must not mislead myself. The pass/fail check the testrig checks is the number of region creations minus the number of region deletions. Those values should balance at the end of the each test-program execution. This, is not testing all of the features of region merging and region variable passing. But creation-deletion is a rough metric, but good for a base test. This isn't at full resolution, but you get the idea, let your brain do the interpolation.

-Matt

Wednesday, August 24, 2011

Preliminary Results

Been working on teasing some results from the RBMM (region-based memory management) stuff I have been working on. I am really excited, not so much with the data, but at the mad LaTeX table I just cranked out.

-Matt

Sunday, August 7, 2011

Unification for Region Inference

So I figured it would be a reasonably good idea to write out about my research into region based memory management (RBMM). Mainly, this post is targeted towards explaining how my gcc-plugin finds objects that are candidates for region'd memory. But before I start getting to far ahead of myself now, let me quickly introduce RBMM.

RBMM is a method of memory management that allows for the automatic maintenance of a programs dynamic memory usage at runtime. Through the use of static analysis at compile time, calls can be emitted into the code of the program being compiled. At runtime, these region calls (e.g. create region, allocate memory from region, delete region, etc) are executed. Some might term this as a form of compile-time garbage collection, and that is not entirely incorrect, however; RBMM can be faster than garbage collection. Most garbage collection methods require the execution of the program to be suspended (extending the total runtime of the application) so that pointers to dynamically allocated data are no longer in use can be free'd. In contrast, regions place dynamically allocated data into chunks of memory which have an identical lifetime. Therefore, freeing the memory associated to a region is really fast. Just kill the region and boom, all memory in that region can be reclaimed. This is much faster than visiting each object individually, and does not require the need to halt the program's execution. Consider a linked list. We could put the whole list into memory, allowing us to free all the nodes at once, when it is no longer needed, as well as possibly enhancing cache locality.

The approach I am taking is to study RBMM from the prospective of the gcc compiler. My RBMM implementation uses a gcc-plugin which analyzes the program being compiled and then emits the code.

The plugin operates on two primary passes, the first pass of the program analyzes the source code and generates data from which the second pass, the transformation pass, later uses to emit the code into the program which executes the region calls (create/alloc/delete). The first piece of the analysis pass looks for variables that are pointers, which could possibly be used for accessing dynamic/heap memory. The gcc compiler provides us a generic intermediate representation which is languages c, c++, ada, etc) agnostic. This intermediate representation is called GIMPLE. Further, my passes operate (analyze and transform) the GIMPLE code which is presented in a static single assignment format (SSA).

The first pass visits each statement in the program. In the case of an assignment statement, we look at what value is being assigned to. If that value is a pointer, our analysis takes note of it. Any variables used on the right-handed side of the assignment statement, which are used for calculating the left-handed side value have some association. This association we call a unification. So the left-hand side is unified with the variables in the right-hand side. But, as mentioned, we only care about pointers, so only the right-hand side pointers are associated with the left-hand side. Similar goes for the case if the left-hand side is being set from a memory allocation call (malloc/new). For example, assume we are analyizing the following statement: foo = somepointer; In this case we would add both 'foo' and 'somepointer' as a single unification piece. In the case of complex data structures, such as 'foo.bar = somepointer;' Our analysis would unify 'foo' and 'bar' followed by unifying that piece (foo and bar) with 'somepointer'.

(pssst... hey... if you read this far, send me an email or leave a really lame comment)

In the case that our analysis comes across a function call (a callee), except for memory allocation calls as mentioned above, we have to defer the analysis of that statement until our analysis (first-pass) has completed. Once that analysis is complete, we have unification data which is specific to each function, which we use as we visit each function call statement that we previously deferred. This portion of the analysis now becomes interprocedural, as it requires knowledge of other functions in the program which are being called. In other words, after we have intraprocedural data for each function, we can use that information interprocedurally to analyze function calls. If our analysis comes across a function call, we can query our stored analysis data and get the unification information about the callee (the thing being called). From this data we can tell if a function takes a pointer and if that function will need to have regions passed to it. Therefore, we can unify the arguments we
pass to the function being called, as well as any left-hand side values which are being set from the return value from the callee. This unification is based on the callee's data which we are guaranteed to have after the first pass has completed.

Once the unification process has completed, the transformation pass can rock. But ultimately that is a discussion for another post.

tl;dr Put things together

-Matt

Sunday, March 13, 2011

Region Based Memory Management

Aside from listening and witnessing some of the most righteous of music (I'm listening to Gogol Bordello right now), and caught some Iron Maiden, Primus, and Slayer at Soundwave last week, what have I been researching, is a question one might be asking themselves. Ok yeah that is one nasty grammatically incorrect piece of English. Well what have I been researching?

Region Based Memory Management! That's right, that is what my PhD is focused on. Sure RBMM is not fresh, but my goal is to implement it for Google's Go language. What I think will be the real unique pieces of research will be how we allocate our regions, dealing with parallization (a'la goroutines) and carrying information across multiple modules (cross-module analysis).

Backup just one second cowboy, neee hawwww! What in the name of doo dah is a region? Region based memory management is a way of managing memory at compile time, one might think of it as compile-time garabage collection. However, that is just not correct at all. At compile-time, via static analysis, dynamic memory allocation calls are located. The compiler can then emit code that will allocate objects that have the same lifetime into regions of contiguous memory. So its a neat little tango between compile-time and run-time communication. Anyways, this grouping of objects with similar lifetimes can benefit cache locality, as well as provide a fast way of deallocating objects, all at once, by just killing the region. That is fast. No need to visit each object individually and free it.

Since Go utilizes a garbage collector, we would like to leave it in place for dealing with cases where regions would live for a long, if not infinite time. Such a case would be when a global object aliases something that is dynamically allocated (in Go that would be via a call to 'new' or 'make'). Garbage collection is traditionally a "slow" process. In addition, collectors can be a burden to real-time systems, as they can slow the program's execution (or even stop it) allowing the collector time to clean-up objects that are no longer being used. Of course, there are parallel garbage collectors, which should not require a stop-the-world approach that traditional collectors implement. But, I am unaware of their actual processing capabilities/limitations.

TL;DR my thesis at this time poses to provide an elusive seductive dance between garbage collection and region based memory management. OhhhhHhh yeaaa mmmmmm mm that sounds like the sexxy.

-Matt

Pages

Monday, December 31, 2012

3...2..1... Collect the Trash!

Monday, July 2, 2012

Copy of my Research Work

Wednesday, November 23, 2011

ANSI and RBMM

Wednesday, August 24, 2011

Preliminary Results

Sunday, August 7, 2011

Unification for Region Inference

Sunday, March 13, 2011

Region Based Memory Management