I was recently surprised to find one of my memory management test cases producing non-deterministic output, even though the program is trivial and relies on little or few external libraries, and no obvious random or non-deterministic input. Actually, I discovered this a while back, but recently revisited it. Simply, I was gathering data for my thesis work, region-based memory management (RBMM), and found that the garbage collector I was comparing my RBMM system against kept producing different collection counts (the number of times the collector executed during my test case's run). See, both garbage collection and RBMM are forms of automatic memory management and I wanted to see how well my RBMM system compares to the more common garbage collection. My research is based on implementing an RBMM system in Go and then comparing my work against the stable garbage collector that Go ships with.
Anyways, I was able to replicate the effect I had previously witnessed in a simpler test (below) using the Go runtime provided by gcc-4.8.1. Remember that Go is garbage collected, thus its garbage collector was producing different collection counts (NumGC below) for each execution of the test.
package main import "runtime" func main() { for i:=0; i<10000000; i++ { f := new(float64) *f++ } var m runtime.MemStats runtime.ReadMemStats(&m) println("NumGC: ", m.NumGC) }
Why would Go's collector be producing non-deterministic collection counts? This program obviously does the same thing every execution, and generates the same amount of garbage each execution. Well, a couple things could be happening. Perhaps Go's collector was looking at overall system resources, and not just for the test case, and deciding when to collect. Or, perhaps it uses a random value which might affect collection. As it turns out, the latter was the case. Go, by default, enables some statistics/profiling information during runtime. What tipped me off was that I ran a grep for "rand" within the runtime source and I found the following in its memory allocation function "runtime_mallocgc" located in gcc-4.8.1/libgo/runtime/malloc.goc. Note that I modified the whitespace some:
if(!(flag & FlagNoProfiling) && (rate = runtime_MemProfileRate) > 0) { if(size >= (uint32) rate) goto profile; if((uint32) m->mcache->next_sample > size) m->mcache->next_sample -= size; else { // pick next profile time // If you change this, also change allocmcache. if(rate > 0x3fffffff) // make 2*rate not overflow rate = 0x3fffffff; m->mcache->next_sample = runtime_fastrand1() % (2*rate); profile: runtime_setblockspecial(v, true); runtime_MProf_Malloc(v, size); } }
So their profiler is having an affect on the amount of memory pressure on the garbage collector, thus causing the collector to kick-in more often than a non-profiled runtime. So, what does this mean? Well, if you do not need any profiling support (probably not if you are running production) then disable the profiler! What I did, was just that, and then executed my simple test multiple times. Bingo! With the profiler disabled I was able to achieve consistent predictable garbage collection counts from Go's collector. To disable the profiler, simply add this line to your main()
runtime.MemProfileRate = 0The latter flag will set the profiling rate to zero, thus the runtime will not collect allocation metadata. Otherwise, this collection of metadata will influence the garbage collector, and that probably is not wanted if you run a production executable (or are performing some verification/testing).
-Matt
I wonder if that's something you could use for a performance tuning tip for the JVM or LLVM based stuff as well?
ReplyDeleteThat really depends on if an LLVM compiled language's runtime or the JVM utilize profiling/debugging feedback structures that have to be allocated from the process' memory space. And if the collector(s) in the latter cases are aware of these allocations and how it handles the management of its internal data.
ReplyDeleteI much prefer deterministic garbage collection. That way I don't have to schlep a full bin back from the curb. :)
ReplyDeleteHappy hunting for the phenomena that are fuzzing your results. I assume the text segment size is quantized. Could the os be giving a bit more or less? Does the rtl ask for more as the code runs? Could other activity be changing the timing, hence the point the code reaches when the collector is awakened?
The non-determinism was coming from the profiler in Go's runtime, whose sampling rate is based on a random value. The profiler collects samples, and must store these samples somewhere. The allocation of data to store samples influences the garbage collection.
Delete