Pages

Friday, November 9, 2012

Devil Hands: The Wonders of a Loosely Typed Compiler

GCC has been something that has captured my attention for years. It seems there is a set of chosen developers whom have bowed to the darklord himself, to create a tool which is just mind-blowingly fascinating. gcc takes text and makes binaries! Sure, that's what most compilers do, but gcc just towered over me with as being, almost, a pinnacle of hackery. It's just shear black magic! And I like the dark arts.

For years I have enjoyed being a software developer; however, I wanted to get my hands dirty on a lower-level than just application development. For some reason, I wanted to work on tools that others use to construct their tools (and applications). Enter the realm of compilers. Since gcc is open source, and many people and organizations rely on it, I felt I would sell my soul. After all, idle hands do the devil's work. So, I decided to put my devil hands to work.

Well, part of my wanting to get into compilers eventually led me towards a PhD... still working on it. Well, I should be working on it, but I am writing this post instead. Anyways, during a PhD meeting the other day, one of my advisers mentioned that gcc has something like over 100 different representations for a certain data structure used inside the compiler. This structure is used to represent various things about the program that gcc is compiling. Things like variables, statements, declarations, etc. This adviser is right, in fact, he is never wrong. No, really, he knows all.

Well, I have been dipping my hands in the secret sauce that is gcc for a while now. In fact, one of the reasons I wanted to get a PhD was to hack on gcc. So my compiler driven region based memory management uses GCC-plugins to accomplish the research. Anyways, I have mentioned to my advisors many times that, yes, in fact the "tree" node data structure in gcc can be anything!

So what is the big deal if a data structure, used inside the compiler, can represent anything? Isn't that polymorphism at its finest? Well... no. It means that the internals of gcc are essentially untyped. Heck, a "tree" is defined as a union consisting of 41 structures. To top it off, there is another file (tree.def) that defines about 195 (in gcc 4.7.1) different "things" a tree can be! This is all fine and dandy, if you know what you are doing. But it means one must be an 31337 ninja when writing gcc code. The danger comes in referencing data inside the tree node. If accessed improperly, the data will not have the correct meaning. While there are a handful of data structure inside a "tree" node, there are over a hundred of possibilities which use certain fields or not. And if you do not ask the type, at runtime, what the type is, and perform your reading of the data properly then *boom* you get bad data. This is what I call a type bomb. The result of a lack of typing on a data structure. In other words, you have a grey box, and you basically have two things you can do, before pulling data out of it. It has structure, but certain parts are not always populated. Either you can ask the black box what is inside it, or you can assume what is inside it. A lack of type safety makes reading data can be dangerous. Since it permits the possibility that you can read the data incorrectly. But, I like this, it makes playing with gcc kinda fun, in a masochistic way.

To reiterate, how you read data in a "tree" instance can vastly change. Suppose that you want to check that whatever the "tree" represents is a global (e.g. global variable). Well, you can use the "is_global_var()" predicate. Just pass a tree instance to it, and it will return true or false. But, one must be very cautious. See, "is_global_var()" only works on declaration nodes, and not the handful of other "things" a "tree" can represent. I can pass a variable declaration "tree" node and the predicate will produce a proper truth value. However, if I pass it a particular instance of a variable, such as given by a SSA representation, then the result is not sound. Since the predicate will always return true-or-false, a value will be returned, it just might be wrong.

Consider the following block of C code:

void foo(void)
{
    int x;
    x = 1;
}

During a walk over the statements in this code, well actually, gcc will be walking a more verbose representation, its intermediate (GIMPLE) representation. Anyways, suppose your compiler pass wants to check if the 'x' variable in the declaration statement 'int x;' is global. The "is_global_var()" will return a proper truth value. However, if you pass the 'x' in the 'x = 1;' assignment statement, the value returned by the predicate cannot be trusted. Since a tree node can be thought of as an opaque type kinda (like a "void *") you can read a "variable" type as if it were a "declaration." And there is nothing wrong with that, no compiler warning. The "is_global_var()" does not check first what type of "tree" node it was passed, and will just report the result if the node were a declaration. This is dangerous, and is what causes the lack of sanity. The type enforcement comes in the comment above the "is_global_var()":
From gcc-4.7.1:

/* Return true if T (assumed to be a DECL) is a global variable.
      A variable is considered global if its storage is not automatic.  */

TLDR; Untyped data structures are convenient, but require that the programmer be an omniscient monkey-saurus and know what is up with the data they are flinging. Tread lightly young warriors. Oh, and to get an idea of what "things" a tree can be, check here. From gcc 4.7.1.

Resources: gcc-4.7.1

-Matt

Sunday, November 4, 2012

Pickin' up the Trash

I assume a few of y'all are wondering what I have been up to. Well, I am working less hours on the timing stuff and more hours on my PhD. In fact, I just wrote a pretty basic garbage collector. I want to weave it into my region memory allocator. In fact, after some more thought, this collector is not entirely complete. It does collect unused data, but it makes a few assumptions, mainly that a pointer does not point into the middle of an array/allocated chunk of memory. This is a horribly bad assumption, especially for what I want, a precise (as opposed to a conservative) collector. My collector works by collecting type and function call data at compile time (gcc plugin that I wrote). Then the collector runtime logic is linked into the resulting binary. The logic behind the allocator is a variation of Cheney's Copying Collector Algorithm. Anyways, I'm proud of the collector. The assumption I have made is part of the problem, and corresponding solution, that my advisers and myself want to write-up for a paper. I hope that goes well. And me? Well, I have a few more months (July is when my scholarships die). So I hope I can start thesis writing soon.

-Matt

Thursday, November 1, 2012

Meanwhile in Australia...

I found this in the bathroom at work earlier today. Really?! Are the upper-decker and pre-bowl really problems in this country? I surly hope most adults know how to handle themselves, and have been properly potty trained. I don't know if I should let this government poster be an insult to my intelligence or not.
-Matt

Monday, October 22, 2012

Ruxcon 2012

Wow! This years Ruxcon, hacker/tech conference, was great! There was a great line-up of talks, and plenty of things to do (CTF, hanging out, parties, lock picking). I was on the staff, and did just a minor bit of writing for the handbook, Saturday party organizing, and some MCing... Yes, they let me near the mic. Anyways, I had a blast! One of the greatest things of Melbourne... this con!

-Matt

Wednesday, October 17, 2012

San Francisco Pics

Well, I am now back in Melbourne. Man, my trip to San Francisco, and then to home in VA, was great. It was nice to catch-up with friends and family. Anyways, after tons of flying, and getting two pat-downs (yes, I denied being scanned by some of the new scanners at the airport), I am back. Sure, the TSA peeps told me that the scanners were "radio wave" based. Well, radio waves are safe and do not produce ionizing radiation. But, in principle, I just denied the scan. So I got the love-touch from the TSA dudes. And yes, the pat-down is a slow process, I wonder if this is intentional, as to convince people to go through the scanners. Anyways, pics, not of the pat-downs, are here.

-Matt

Tuesday, October 2, 2012

Satanic Canaries and Binary Runtime Hardening

A while back I started writing some stack canaries to play with GCC RTL. Anyways, this exercise resulted in Satanic Canary, a series of stack canaries implemented as a GCC-plugin. This software can be obtained here. With that said, I also wrote an article talking more in depth on this topic. Linux Journal just published it in their October 2012 edition.

-Matt

Monday, October 1, 2012

San Frangoatpenis

Wow. It has been a helluva week. My manager and myself rocked the Plugfest at ISPCS and then we presented two papers, one a piece. The papers went well, but the test-lab/demo/plugfest was pretty stressful. Lots of coffee consumed. Ultimately, I had a blast.

Anyways, I went to Haight Ashbury today, and let me just say. I know I might look a bit stoner like, but I could not tell you how many people were so generous to offer me some of their medical plant material. I turned them down, even though I am vegan and appreciate plants. I also went to the Amoeba store, per recommendation of my manager. Amazing record store... best one ever! FRIKKIN' HUGE METAL SECTION. It as if the pit of Satan's anus ruptured open, and the contents spilled out upon the back wall of their lovely store. I love Haight Ashbury. Oh, and I met a dude there who recommended a Brazilian band to me Goat Penis. And this dude also rocks this website: http://www.metalifestyle.com. Afterwards, I walked to the park at the end of Haight and chilled there for a bit. Laid on the grass, hungout.

-Matt