# A frequentist approach to probability

One thing that always confused me in my intro stats classes was the concept of a random variable. A random variable is not a variable like I’m used to thinking about, like a thing that has one value at a time. A random variable is instead an object that you can sample values from, and the values you get will be distributed according to some underlying probability distribution.

In that way it sort of acts like a container, where the only operation is to sample a value from the container. In Scala it might look something like:

The idea is that `get`

returns a different value (of type `A`

) from the distribution every time you call it.

I’m going to add a `sample`

method that lets me draw a sample of any size I want from the distribution.

Now to create a simple distribution. Here’s one whose samples are uniformly distributed between 0 and 1.

And sampling it gives

```
scala> uniform.sample(10).foreach(println)
0.15738645964157327
0.7827120503875181
0.8787176537434814
0.38506604599728245
0.9469681837641953
0.20822217752687067
0.8229649049912187
0.7767540566158817
0.4133782959276152
0.8152378840945975
```

### Transforming distributions

Every good container should have a `map`

method. `map`

will transform values produced by the distribution
according to some function you pass it.

(Quick technical note: I added a self-type annotation that makes `self`

an alias for `this`

so that it’s easier to refer to in anonymous inner classes.)

Now I can map `* 2`

over the uniform distribution, giving a uniform distribution between 0 and 2:

```
scala> uniform.map(_ * 2).sample(10).foreach(println)
1.608298200368093
0.14423181179528677
0.31844160650777886
1.6299535560273648
1.0188592816936894
1.9150473071752487
0.9324757358322544
0.5287503566916676
1.35497977515358
0.5874386820078819
```

`map`

also lets you create distributions of different types:

```
scala> val tf = uniform.map(_ < 0.5)
tf: Distribution[Boolean] = <distribution>
scala> tf.sample(10)
res2: List[Boolean] = List(true, true, true, true, false, false, false, false, true, false)
```

`tf`

is a `Distribution[Boolean]`

that should give `true`

and `false`

with equal probability.
Actually, it would be a bit more useful to be able to create distributions giving `true`

and `false`

with arbitrary
probabilities.

Trying it out:

```
scala> tf(0.8).sample(10)
res0: List[Boolean] = List(true, false, true, true, true, true, true, true, true, true)
```

A very closely related distribution is the Bernoulli distribution, which gives `0`

or `1`

with some
probability instead of `true`

and `false`

. This can be achieved with a simple `map`

:

Cool. Now I want to measure the probability that a random variable will take on certain values. This is easy to do empirically by pulling 10,000 sample values and counting how many of the values satisfy the given predicate.

```
scala> uniform.pr(_ < 0.4)
res2: Double = 0.4015
```

It works! It’s not exact, but it’s close enough.

Now I need two ways to transform a distribution.

`given`

creates a new distribution by sampling from the original distribution and discarding values
that don’t match the given predicate. `repeat`

creates a `Distribution[List[A]]`

from a `Distribution[A]`

by producing samples that are lists of samples from the original distributions.

OK, now one more distribution:

Let’s see how all this works.

```
scala> val die = discreteUniform(1 to 6)
die: Distribution[Int] = <distribution>
scala> die.sample(10)
res0: List[Int] = List(1, 5, 6, 5, 4, 3, 5, 4, 1, 1)
scala> die.pr(_ == 4)
res1: Double = 0.1668
scala> die.given(_ % 2 == 0).pr(_ == 4)
res2: Double = 0.3398
scala> val dice = die.repeat(2).map(_.sum)
dice: Distribution[Int] = <distribution>
scala> dice.pr(_ == 7)
res3: Double = 0.1653
scala> dice.pr(_ == 11)
res4: Double = 0.0542
scala> dice.pr(_ < 4)
res5: Double = 0.0811
```

Neat! This is getting useful.

OK I’m tired of looking at individual probabilities. What I really want is a way to visualize the entire distribution.

```
scala> dice.hist
2 2.67% ##
3 5.21% #####
4 8.48% ########
5 11.52% ###########
6 13.78% #############
7 16.61% ################
8 13.47% #############
9 11.17% ###########
10 8.66% ########
11 5.64% #####
12 2.79% ##
```

That’s better. `hist`

pulls 10,000 samples from the distribution, buckets them, counts the size of the buckets,
and finds a good way to display it. (The code is tedious so I’m not going to reproduce it here.)

### Don’t tell anyone it’s a monad

Another way to represent two die rolls is to sample from `die`

twice and add the samples.

```
scala> val dice = die.map(d1 => die.map(d2 => d1 + d2))
dice: Distribution[Distribution[Int]] = <distribution>
```

But wait, that gives me a `Distribution[Distribution[Int]]`

, which is nonsense. Fortunately there’s an easy fix.

Let’s try it now.

```
scala> val dice = die.flatMap(d1 => die.map(d2 => d1 + d2))
dice: Distribution[Int] = <distribution>
scala> dice.hist
2 2.71% ##
3 5.17% #####
4 8.23% ########
5 11.54% ###########
6 14.04% ##############
7 16.67% ################
8 13.53% #############
9 10.97% ##########
10 8.81% ########
11 5.62% #####
12 2.71% ##
```

It worked!

The definition of `dice`

can be re-written using Scala’s `for`

-comprehension syntax:

This is really nice. The `<-`

notation can be read as sampling a value from a distribution.
`d1`

and `d2`

are samples from `die`

and both have type `Int`

.
`d1 + d2`

is a sample from `dice`

, the distribution I’m creating.

In other words, I’m creating a new distribution by writing code that constructs a single sample of the distribution from individual samples of other distributions. This is pretty handy! Lots of common distributions can be constructed this way. (More on that soon!)

### Monty Hall

I think it would be fun to model the Monty Hall problem.

This code constructs a distribution of pairs representing the door the prize is behind and the door you switched to. Let’s see how often those are the same door:

```
scala> montyHall.pr{ case (prize, switch) => prize == switch }
res0: Double = 0.6671
```

Just as expected. Lots of people have a hard time believing the explanation behind why this is correct, but there’s no arguing with just trying it 10,000 times!

### HTH vs HTT

Another fun problem: if you flip a coin repeatedly, which pattern do you expect to see earlier, heads-tails-heads or heads-tails-tails?

First I need the following method:

`until`

samples from the distribution, adding the samples to the *front* of the list until the list satisfies some predicate.
A single sample from the resulting distribution is a list that satisfies the predicate.

Now I can do:

Looking at the distributions:

```
scala> hth.hist
3 11.63% ###########
4 12.43% ############
5 9.50% #########
6 7.82% #######
7 7.31% #######
8 6.51% ######
9 5.41% #####
10 4.57% ####
11 4.56% ####
12 3.78% ###
13 3.44% ###
14 3.04% ###
15 2.52% ##
16 2.08% ##
17 1.76% #
18 1.70% #
19 1.34% #
20 1.34% #
scala> htt.hist
3 12.94% ############
4 12.18% ############
5 12.48% ############
6 11.29% ###########
7 9.88% #########
8 7.67% #######
9 6.07% ######
10 5.32% #####
11 4.18% ####
12 3.51% ###
13 2.78% ##
14 2.23% ##
15 1.75% #
16 1.40% #
17 1.21% #
18 0.92%
19 0.78%
20 0.60%
```

Eyeballing it, it appears that HTT is likely to occur earlier than HTH. (How can this be? Excercise for the reader!) But I’d like to get a more concrete answer than that. What I want to know is how many flips you expect to see before seeing either pattern. So let me add a method to compute the expected value of a distribution:

Hm, that `.sum`

is not going to work for all `A`

s.
I mean, `A`

could certainly be `Boolean`

, as in the case of the `tf`

distribution (what is the expected value of a coin flip?).
So I need to constrain `A`

to `Double`

for the purposes of this method.

```
scala> hth.ev
<console>:15: error: Cannot prove that Int <:< Double.
hth.ev
```

Perfect. You know, it really bothered me when I first learned that the expected value of a die roll is 3.5.
Requiring an explicit conversion to `Double`

before computing the expected value of any distribution
makes that fact a lot more palatable.

```
scala> hth.map(_.toDouble).ev
res0: Double = 9.9204
scala> htt.map(_.toDouble).ev
res1: Double = 7.9854
```

There we go, empirical confirmation that HTT is expected to appear after 8 flips and HTH after 10 flips.

I’m curious. Suppose you and I played a game where we each flipped a coin until I got HTH and you got HTT. Then whoever took more flips pays the other person the difference. What is the expected value of this game? Is it 2? It doesn’t have to be 2, does it? Maybe the distributions are funky in some way that makes the difference in expected value 2 but the expected difference something else.

Well, easy enough to try it.

```
scala> diff.map(_.toDouble).ev
res3: Double = 1.9976
```

Actually, it does have to be 2. Expectation is linear!

### Unbiased rounding

At Foursquare we have some code that computes how much our customers owe us, and charges them for it. Our payments provider, Stripe, only allows us to charge in whole cents, but for complicated business reasons sometimes a customer owes us fractional cents. (No, this is not an Office Space or Superman III reference.) So we just round to the nearest whole cent (actually we use unbiased rounding, or banker’s rounding, which rounds 0.5 cents up half the time and down half the time).

Because we’re paranoid and also curious, we want to know how much money we are losing or gaining due to rounding. Let’s say that during some period of time we saw that we rounded 125 times, and the sum of all the roundings totaled +8.5 cents. That kinda seems like a lot, but it could happen by chance. If fractional cents are uniformly distributed, what is the probability that you would see a difference that big after 125 roundings?

Let’s find out.

```
scala> val d = uniform.map(x => if (x < 0.5) -x else 1.0-x).repeat(125).map(_.sum)
d: Distribution[Double] = <distribution>
scala> d.hist
-10.0 0.02%
-9.0 0.20%
-8.0 0.57%
-7.0 1.32% #
-6.0 2.15% ##
-5.0 3.75% ###
-4.0 5.12% #####
-3.0 7.83% #######
-2.0 10.58% ##########
-1.0 11.44% ###########
0.0 12.98% ############
1.0 11.57% ###########
2.0 10.68% ##########
3.0 7.73% #######
4.0 5.70% #####
5.0 3.88% ###
6.0 2.32% ##
7.0 1.21% #
8.0 0.65%
9.0 0.25%
10.0 0.06%
```

There’s the distribution. Each instance is either a loss of `x`

if `x < 0.5`

or a gain of `1.0-x`

.
Repeat 125 times and sum it all up to get the total gain or loss from rounding.

Now what’s the probability that we’d see a total greater than 8.5 cents? (Or less than -8.5 cents — a loss of 8.5 cents would be equally surprising.)

```
scala> d.pr(x => math.abs(x) > 8.5)
res0: Double = 0.0098
```

Pretty unlikely, about 1%! So the distribution of fractional cents is probably not uniform. We should maybe look into that.

### The normal distribution

One last example. It turns out the normal distribution can be approximated pretty well by summing 12 uniformly distributed random variables and subtracting 6. In code:

Here’s what it looks like:

```
scala> normal.hist
-3.50 0.04%
-3.00 0.18%
-2.50 0.80%
-2.00 2.54% ##
-1.50 6.62% ######
-1.00 12.09% ############
-0.50 17.02% #################
0.00 20.12% ####################
0.50 17.47% #################
1.00 12.63% ############
1.50 6.85% ######
2.00 2.61% ##
2.50 0.82%
3.00 0.29%
3.50 0.01%
scala> normal.pr(x => math.abs(x) < 1)
res0: Double = 0.6745
scala> normal.pr(x => math.abs(x) < 2)
res1: Double = 0.9566
scala> normal.pr(x => math.abs(x) < 3)
res2: Double = 0.9972
```

I believe it! One more check though.

The variance of a random variable with mean is , and the standard deviation is just the square root of the variance.

And now:

```
scala> normal.stdev
res0: Double = 0.9990012220368588
```

Perfect.

This is a great approximation and all, but `java.util.Random`

actually provides a `nextGaussian`

method,
so for the sake of performance I’m just going to use that.

### Conclusion

The frequentist approach lines up really well with my intuitions about probability. And Scala’s `for`

-comprehensions
provide a suggestive syntax for constructing new random variables from existing ones. So I’m going to continue
to explore various concepts in probability and statistics using these tools.

In later posts I’ll try to model Bayesian inference, Markov chains, the Central Limit Theorem, probablistic graphical models, and a bunch of related distributions.

All of the code for this is on github.

blog comments powered by Disqus