-Matt
Friday, November 30, 2012
New Version of PDFResurrect... Now with enhanced EOF locating.
Monday, November 26, 2012
Answering a begging question... what does the heap look like?
-Matt
Sunday, November 18, 2012
Syntax Party 2012
-Matt
Monday, November 12, 2012
Go Go Gadget Now().Nanosecond()
Before I even get started, the "issue" I discuss below was resolved just three days ago. Of course, there will still be some time before this makes it to the next release of Go and also when GCC updates their libraries. Thanks Brad for pointing this out! I had no idea. Well, I don't want to delete this post... so for historical reasons it follows:
Time. It is such an unusual thing. I suppose it is "unusal" as it being such an abstract quantity. I cannot just reach out into the air and pull down some time. Yet, we humans try to quantize the abstract. After all, humans like to understand how things work. And we like to measure the things that do work, or measure things to figure out why they work the way they do. In other words: we quantize time, break it up into distinct bits and pieces, so that we can measure and understand it.
The more we chop-up time, the more accurate we can define something in all four dimensions (three dimensions of space, and one of time). For instance, I can tell a person to meet me at a particular location on the earth, which has 3 dimensions (e.g. meet me on floor 4 of the building at corner of Foo and Bar streets). But when should my acquaintance meet me? Well, I tell them a time. Possibly a really really precise time, at 7PM and 4 seconds and 3001 nanoseconds. I'm a robot. Anyways, I am digressing from the purpose of this post.
It is not uncommon for programming languages to provide a way (library) to access time. This time comes from the system that is executing the program. For instance, "gettimeofday()" and "clock_gettime()" are just two such functions that ask a POSIX (e.g. Linux, BSD...) system for what the time is. These two functions return a value that represents time. But how precise the result is, defines how fine you can measure an event. I want to look at the time routine provided by the Go programming language.
The following is my analysis of the Go 1 "time" package's "Now()" function, as provided by gcc 4.7.1 and 4.7.2 for a 64bit x86 architecture. We investigate the possible lack of precision from the resulting Nanosecond member. This is not an analysis of how the Google 6g compiler and provided Go 1 "time" package implements things. I did peer through the Go 1 "time" package provided by the 6g Google Go compiler version 1.0.3. It seems that their routine works similarly, so for all intensive purposes, we can assume what I write below for gcc is similar to the 6g compiler.
Firstly, I notice that the Go API provides a function for getting the nanoseconds from the current time of day. That's nifty. The following code should accomplish this.
package main import "time" func main() { println("The current nanosecond is:", time.Now().Nanosecond()) }
The above function obtains the current time, and then returns which nanosecond of the current second the time is. See, time, in this case the second, can be divided into smaller increments. The API says that the nanosecond returned is the nanosecond of the current second of time. Now, recall that a nanosecond represents just 1^-9 (0.000000001) of a second. That's a pretty darn small increment of time. This means that the "time" package (library) of Go should use a data structure large enough to contain 1^9 or 1000000000 values.
Well, a look at the Go runtime library (gcc and 6g) uses a 32-bit signed integer (int) to represent the current nanosecond of the current second. As any good data-structure-athiest-skeptic needs to ask is: "Are 32-bits large enough to hold a value of 1000000000?" Actually, is the "int" data type wide enough to contain the maximum value of a nanosecond? Remember that an "int" is signed, so one bit does not represent a time, rather that single bit is dedicated to representing a positive/negative (signedness) state of the value. I can't imagine a negative time, and zombie-Einstein might puke if you were to venture into the past.
The time library for Go provided by gcc uses the "gettimeofday()" system call, or virtual dynamic shared object vDSO, function provided by the operating system. A POSIX compliant operating system should provide the latter routine which obtains the time from the system. The Go "time" library calls this routine to obtain time. Go will then (convert) the returned value into a Go structure of time. I did not see in the Google 6g version of the library where they make an explicit call to "gettimeofday()." However, I did see where the library accesses members of the result, which are similar as to what "gettimeofday()" returns.
An integer can hold a range of values and has to be large enough to at least contain the values of such a range. 31-bits of value and one to represent signedness. Go implements "int" as being 32-bits, is is the data structure used for storing nanoseconds. So 2^31 - 1 or a maximum value of 2147483647. Which... YES... is large enough to contain enough nanoseconds for a second. (I had a typo in my original post and I thank the anonymous poster who cleared that up for me).
The POSIX standard defines that the "gettimeofday()" function returns seconds and MICROSECONDS... I'll say that again, microseconds, since the epoch of bell-bottoms and afro-sheen (January 1, 1970). In fact, the POSIX standard is a bit loose, and says it should return at least seconds and microseconds (leaving the doors open for other values). But, a browse of the code in gcc and 6g shows that they take a microsecond value and treat that as a base for nanoseconds.
So, what then is the Go "time" library doing? Well, when we call "Now()" we get a "Time" object. This object is initialized by an architecture specific routine which queries the system (using a system call) and sets-up the "Time" object, including the nanosecond member, from the result of that system call. What the x86 64-bit Linux implementations seems to be doing, for the "Now()" routine is effectively querying "gettimeofday()" and then taking the microsecond value and multiplying it by 1000 to represent nanoseconds. So, for those using the nanosecond value returned by a "Time" object created by "Now()", the last three digits will be zero. This "Time" object can be further manipulated (e.g. using "time.Add()", "time.Date(), or "time.Unix()") which might modify the nanosecond value, presenting a more precise representation of nanoseconds. This means that the initial return of time from "Now()" will have a nanosecond value that is always missing the lower three digits (and bits) of the value. Well, it turns out that 999999000, the maximum value of nanoseconds Go can return by the "Time" object created by "Now()" is still .99 of a second, but it's just not the high precision that might be desired. See, the lower three bits are missing... that latter value converted to binary: 0b111011100110101100011000011000
For most people this is not a big deal. But being aware is always a good thing. However, if you are using "Now()" for nanoseconds, you might in fact be looking for a high precision version of time.
TLDR; The Go runtime "time" package returns a scaled microsecond value to
represent nanoseconds and not the actual nanoseconds to a high granularity, when time is created using the "Now()" library function. The
lower three digits (1000 values) are always zero.
-Matt
Additionl Resources:
GCC
Go (and Google Go 6g Compiler)
gettimeofday()
C Standards Website
Sunday, November 11, 2012
Brian Cox
-Matt
Friday, November 9, 2012
Devil Hands: The Wonders of a Loosely Typed Compiler
For years I have enjoyed being a software developer; however, I wanted to get my hands dirty on a lower-level than just application development. For some reason, I wanted to work on tools that others use to construct their tools (and applications). Enter the realm of compilers. Since gcc is open source, and many people and organizations rely on it, I felt I would sell my soul. After all, idle hands do the devil's work. So, I decided to put my devil hands to work.
Well, part of my wanting to get into compilers eventually led me towards a PhD... still working on it. Well, I should be working on it, but I am writing this post instead. Anyways, during a PhD meeting the other day, one of my advisers mentioned that gcc has something like over 100 different representations for a certain data structure used inside the compiler. This structure is used to represent various things about the program that gcc is compiling. Things like variables, statements, declarations, etc. This adviser is right, in fact, he is never wrong. No, really, he knows all.
Well, I have been dipping my hands in the secret sauce that is gcc for a while now. In fact, one of the reasons I wanted to get a PhD was to hack on gcc. So my compiler driven region based memory management uses GCC-plugins to accomplish the research. Anyways, I have mentioned to my advisors many times that, yes, in fact the "tree" node data structure in gcc can be anything!
So what is the big deal if a data structure, used inside the compiler, can represent anything? Isn't that polymorphism at its finest? Well... no. It means that the internals of gcc are essentially untyped. Heck, a "tree" is defined as a union consisting of 41 structures. To top it off, there is another file (tree.def) that defines about 195 (in gcc 4.7.1) different "things" a tree can be! This is all fine and dandy, if you know what you are doing. But it means one must be an 31337 ninja when writing gcc code. The danger comes in referencing data inside the tree node. If accessed improperly, the data will not have the correct meaning. While there are a handful of data structure inside a "tree" node, there are over a hundred of possibilities which use certain fields or not. And if you do not ask the type, at runtime, what the type is, and perform your reading of the data properly then *boom* you get bad data. This is what I call a type bomb. The result of a lack of typing on a data structure. In other words, you have a grey box, and you basically have two things you can do, before pulling data out of it. It has structure, but certain parts are not always populated. Either you can ask the black box what is inside it, or you can assume what is inside it. A lack of type safety makes reading data can be dangerous. Since it permits the possibility that you can read the data incorrectly. But, I like this, it makes playing with gcc kinda fun, in a masochistic way.
To reiterate, how you read data in a "tree" instance can vastly change. Suppose that you want to check that whatever the "tree" represents is a global (e.g. global variable). Well, you can use the "is_global_var()" predicate. Just pass a tree instance to it, and it will return true or false. But, one must be very cautious. See, "is_global_var()" only works on declaration nodes, and not the handful of other "things" a "tree" can represent. I can pass a variable declaration "tree" node and the predicate will produce a proper truth value. However, if I pass it a particular instance of a variable, such as given by a SSA representation, then the result is not sound. Since the predicate will always return true-or-false, a value will be returned, it just might be wrong.
Consider the following block of C code:
void foo(void) { int x; x = 1; }
During a walk over the statements in this code, well actually, gcc will be
walking a more verbose representation, its intermediate (GIMPLE) representation.
Anyways, suppose your compiler pass wants to check if the 'x'
variable in the declaration statement 'int x;' is global. The "is_global_var()"
will return a proper truth value. However, if you pass the 'x' in the 'x = 1;'
assignment statement, the value returned by the predicate cannot be
trusted. Since a tree node can be thought of as an opaque type kinda (like a
"void *") you can read a "variable" type as if it were a "declaration." And
there is nothing wrong with that, no compiler warning.
The "is_global_var()" does not check first what type of "tree" node it was
passed, and will just report the result if the node were a declaration. This is
dangerous, and is what causes the lack of sanity. The type enforcement comes in
the comment above the "is_global_var()":
From gcc-4.7.1:
/* Return true if T (assumed to be a DECL) is a global variable. A variable is considered global if its storage is not automatic. */
TLDR; Untyped data structures are convenient, but require that the programmer
be an omniscient monkey-saurus and know what is up with the data they are
flinging. Tread lightly young warriors. Oh, and to get an idea of what
"things" a tree can be, check here. From gcc 4.7.1.
Resources: gcc-4.7.1
-Matt
Sunday, November 4, 2012
Pickin' up the Trash
-Matt