Random Thoughts About Go

By Ragnar Lönn. On 2016-06-22

Load Impact is here to enhance your load testing and help you find performance bottlenecks in your applications. But we also like to give you helpful engineering tips across a broad spectrum of topics. Check out this post on Go — the programming language originally developed by Google engineers.

So, I’ve started learning a bit of Go, which seems to be a language in my taste (small, fast and not too high-level), but some things are a little tricky to wrap my head around. Like, me being an old C programmer, happily discovering pointers but then getting confused by not being able to do pointer arithmetics, by pointers getting dereferenced automatically, etc. Or some things around the package system that I haven’t quite grasped.

Anyway, most of these things are just me not being familiar with the way things work in Go. I am quickly getting into the language and suspect I won’t be very much confused for very much longer. Unless my blood sugar is low of course, in which case pretty much everything confuses me, but that’s not really related to Go.

There is one thing, however, that I ran into recently that confused the hell out of me at first, then I realized what the problem was, and now I’m mostly curious about why things are the way they are. Is it by design or by accident? I haven’t found any info on it anywhere, so I thought I’d write a post about it and see if someone who knows more than I do can enlighten me.

What I found, at first, was that the rand.IntX functions in the Go standard library behaved weirdly!

If you’re not familiar with the Go standard libraries, there are these functions to generate pseudo-random integer numbers, they are called things like rand.Int(), rand.Intn(), rand.Int32() etc. I did this:

val := rand.Intn(100)

...which means I want a random integer between 0 and 100 assigned to the “val” variable. When I compiled and ran the program, val was assigned the value 81. So far so good. Then I wanted the random number to come from a larger range, so I changed the line to:

val := rand.Intn(200)

Recompile, run, and the variable was still 81. What the hell? Did I forget to recompile? Was I running an old binary? Which working directory am I in anyway… I started investigating all sorts of usual suspects (including, of course, “when was my last meal?”) before I realized it was actually my updated code doing this.

Then I started reading and re-reading my source code to spot the stupid typo I had obviously made somewhere. Since it was just 3 lines or so of very simple code it got boring fast, so I quickly gave up trying to stare it down and instead decided to start looking elsewhere.

My assumption was that the rand.IntX functions were based on random values from their floating point cousins - e.g. rand.Float64() and then multiplied by either 100 or 200, but apparently not, as a positive random number N multiplied by 100 can’t really be equal to the same N multiplied by 200...

I started feeding Intn() with lots of different numbers, and realized that any input numbers that were related to each other would produce very related, sometimes identical, output. Here is a list:

rand.Intn(100) = 81
rand.Intn(200) = 81
rand.Intn(300) = 281
rand.Intn(400) = 81
rand.Intn(500) = 81
rand.Intn(600) = 281
rand.Intn(700) = 181
rand.Intn(800) = 481
rand.Intn(900) = 581
rand.Intn(1000) = 81
rand.Intn(1100) = 881
rand.Intn(1200) = 881
rand.Intn(1300) = 881
rand.Intn(1400) = 881
rand.Intn(1500) = 581
rand.Intn(1600) = 481
rand.Intn(1700) = 681
rand.Intn(1800) = 1481
rand.Intn(1900) = 81
rand.Intn(2000) = 81

That gave me the confidence to actually look at the source for rand.Intn(). That function uses rand.Int31n() to generate its PRN and here is the code for rand.Int31n():

// Int31n returns, as an int32, a non-negative pseudo-random number in [0,n).
// It panics if n <= 0.
func (r *Rand) Int31n(n int32) int32 {
    if n <= 0 {
        panic("invalid argument to Int31n")
    }
    if n&(n-1) == 0 { // n is power of two, can mask
        return r.Int31() & (n - 1)
    }
    max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
    v := r.Int31()
    for v > max {
        v = r.Int31()
    }
    return v % n
}

So, mystery solved. It’s the modulo operation at the end that does it. I was expecting my randomly generated number to be e.g. 0.8134203984585769487 or something, behind the scenes, and that Intn(100) would then return 81 because it multiplied that value with 100, while Intn(200) would with the same logic return 163 (or actually, I guess it would be int(rand.Float64() * max + 0.5), so e.g. the float64 random number 0.80639487298792 would then have resulted in the output integer 81)

My first guess was that doing the modulo thing is faster, but a colleague who is better at maths and algorithms than I am told me that it is actually slower. I did a small, very un-scientific test, writing my own rand.Intn() function that looked like this:

func myIntn(n int) int {
    return int(rand.Float64() * float64(n) + 0.5)
}

Then I ran it 1 million times, which took 43ms on my laptop. Running the real rand.Intn() the same number of times, with the same input values (1-1000000), took 53ms. Pretty much the same speed (although perhaps my implementation could be optimized).

My colleague said that he thought the only reason for not using multiplication would be to maybe get as much randomness as possible, because you can lose some precision when doing multiplications. In my mind, however, doing it the Go way results in less usable random numbers, and to me that seems worse than losing a little precision.

Why would this way of producing random numbers be worse than using the rand.Float64() as a base then? Simply because the range/max values that rand.Intn() gets is likely not uniformly distributed/random, but probably quite related. Doing a modulo operation means you then get output that is very related also, which is likely not what people want when they call a random number generator function.

I suspect that just like in my application, many constants are powers of 10, or multiples of a power of 10, and a lot of dynamic data (numbers) input by application users is probably also powers of 10 or multiples thereof. This is because humans are hardwired from early on to think in base 10 and we like to use “round” numbers. Powers of 10 look good, they are easy to understand and to manipulate by our brains.

I think a lot of times, the input parameter to rand.Intn() is going to be a power of 10 or a multiple of a power of 10, and when input numbers are that related you get the kind of behavior that confused me here — rand.Intn(100) may give you the same result as rand.Intn(200) and any rand.Intn() called with a number that is a multiple of 100 will get very similar output.

This is not much of a problem if you seed the PRNG with unique values each time your application starts, which is common practice. But in my particular case, I wanted to seed the PRNG with the same seed each time to get a reproducible random number sequence (I was experimenting with procedural generation of a data set and wanted repeatability). Then it will cause trouble if I change input parameters (the “max” numbers in the rand.Intn(max) call) and the input parameters are similar to the old ones (e.g. 200 instead of 100).

Is this enough of an issue to consider changing the way rand.Intn() works, or is my use case a very special one that shouldn’t be given a lot of weight? Well, as my colleague also pointed out, unit testing is a pretty common use case where this kind of behavior might be bad. If you let rand.Intn(100) be the input to a unit test and then change it to rand.Intn(200) and expect your test to do something else, you might be disappointed — or at least, as in my case, confused.

As for me, when I know about the issue I can work around it (implementing my own rand.Intn(), most likely) but I am curious to know if there is any particular reason it was implemented this way in Go? It could have been:

func rand.Intn(max int) {
    return int((rand.Float64() * float64(max)) + 0.5)
}

...which in my mind is both simpler and produces a more useful random integer value. My colleague said Erlang does it this way, and told me the implementation there looks like this:

trunc(uniform() * N) + 1

(where trunc() returns a random float between 0 and 1)

I also checked the Python standard library, which seems to do it the same way.

rem = maxsize % n
limit = (maxsize - rem) / maxsize   # int(limit * maxsize) % n == 0
r = random()
while r >= limit:
    r = random()
return int(r*maxsize) % n

The above is code from the Python _randbelow() function, which is used by randrange() and randint() to generate ranged random integer numbers. The function uses output from random() as the random component used to calculate the random integer, and random() outputs a random float in the range 0..1.

It’s easy to see that the Python library uses multiplication just by observing the output from a simple program:

$ cat randtest.py
import random, sys
random.seed("hello")
print random.randint(1,int(sys.argv[1]))
$ python randtest.py 100
82
$ python randtest.py 200
164
$ python randtest.py 300
246

The multiplication is obvious, because I get output that is multiples of 82 when I feed the function multiples of 100.

Note: There is a call to random.seed() at the beginning. Turns out the Python random generator seeds itself unless you do it manually. This means two consecutive runs will produce different output unless you specifically tell the PRNG to begin in the same state every time.

So why doesn’t Go do it this way? Is it by accident or design?

Also, if there is a reason to do it this way, might it be a good idea to auto-seed the PRNG in Go, just like Python does?

If anyone wants to try this out at home you can clone this repo: https://github.com/ragnarlonn/randtest

It contains both the Python program and the Go program. You can of course just do:

go get github.com/ragnarlonn/randtest
Loading...