Monday, January 5, 2009

What mathematicians do

Imagining that mathematicians spend all their times doing calculation is very like imagining that writers spend all their time spelling words.

Some minor computer geekery

Today I managed to get R running under the PortableApps menu, with only a little fiddling. Which if you're a big computer nerd is really not much of an achievement at all, but for an old dinosaur like me, it's pretty good.

Adding R makes that particular USB stick a lot nearer to having "all the software you need on a USB" if you're a statistician.

I'm quite impressed with some of the portable applications, too.

Getting R onto the USB is trivial - you can just copy an existing R (including the subdirectories) from a hard drive, and you have all your installed packages and your saved variables and everything. But making it appear in the menu was kind of nice.

Just how small is the smallest illegal number?

I've seen a number of posts recently relating to "illegal numbers", such as these ones:

Copyrighting a number, and
Converting pi to binary, which points to
Converting Pi to binary: Don't do it!

The first of these relates to the fact that files are just essentially numbers, and replicating certain numbers can be illegal (even to DMCA issues). The third (and the second which points to it) is essentially remarking on the fact that π is thought to be a normal number, so it is believed that all possible finite digit sequences occur in its expansion (and so contains all such illegal numbers).

Let's consider the first issue: some integers are definitely illegal - it appears quite possible to get yourself into trouble posting particular integers. The post at Copyrighting a number suggests that the smallest such number is incomrehensibly large.

But I suspect that this isn't necessarily the case.

Certainly it's true that the smallest integer representing a song would be huge. The smallest integer representing a book would be far, far larger very large**. But it's not only entire songs or books that are subject to copyright. Indeed, it only takes a few bars before you get into copyright trouble. And it doesn't need to be a few bars of a perfomance. In fact, I could represent in a compact ASCII form just enough of a simple tune to get myself into trouble in only a small number of characters (I think with careful choice of tune, the extracted bars and the representation, I could do it in less than one line of text here). I could represent it in some compressed form in much less space (posting a zip file of a copyright work is just as illegal as posting an uncompressed version, and so it would be with any other compression scheme). Maybe I could get into trouble posting slightly less of a book - I can quote a tiny bit of a book in certain circumstances, but if those circumstances don't hold, how little of a book can I quote before I am in copyright violation?

** [That "far, far larger" (now elided) looks like a dumb error if you compare the size of a high fidelity musical recording with a book recorded as, say an ascii file. But that's not comparing like with like. A book is like a score (and lyrics), not like the performance of it. If we compare a performance of a book with a performance of a song, then again, the book is larger (perfomed books typically take several CDs). But arguing about this is a distraction from the point here, so I have changed the text above to avoid the issue.]

So, then, we are left to wonder, what is the smallest possible illegal digit sequence?

What it the smallest possible representation of it?

(Might the πth root of 237 have that digit sequence somewhere in its first ten thousand digits, thus giving me an incredibly compact way of specifying an illegal digit sequence? I doubt it.)

Anyway, I'm inclined to think that it might be possible to get yourself into trouble with say the DMCA with some surprisingly small digit sequences.

And wait a minute - does this post constitute an encouragement to breaking the law?
I've asked for people to tell me the smallest illegal number. Certainly, if someone does post such a number in the comments, it would be easy to argue that I incited it. But actually, it seems the law on that doesn't actually require the other person to break the law. And even if it's not enough for me just to ask what the number is, I don't want any of my readers going to jail if they do figure out the number.

One less

Perhaps it's better to ask what number is one less than the smallest illegal number. Maybe that's safer.

But consider the RIAA - I don't expect that the RIAA would have the slightest hesitation in trying to smack me down for publishing one less than the smallest illegal number because that description of it implies an algorithm for producing the smallest illegal number from it... and so this very paragraph contains an algorithm for producing an illegal number, were I to later post one less than such a number. (In fact, is it illegal to describe such an algorithm already, without having a number to apply it to? The DMCA seems to be pretty broad. Have I in fact already broken the law merely by describing the algorithm - that "the way to produce the smallest illegal integer from the next smallest number is simply to add 1"?)

So asking for one less than an illegal number is no good. Certainly publishing that number with the description ("just add one") is illegal. But if I publish a number and an algorithm for producing an illegal number from it in two different places, I believe I have still broken the law (they don't have to be in the same location, surely, or the law would be trivially circumvented). Given I've already posted an algorithm for producing an illegal number from one less than it, it seems that posting one less than the smallest illegal number anywhere must in turn be illegal.

No doubt you've noticed the problem already.

By that reasoning, one less than that number must also be illegal.

Assuming we restrict our search to the non-negative integers, I believe I can therefore assert that the smallest illegal number I can publish to my blog is zero.

Oops - and just I made the mistake of publishing it to my blog.

Numbers. Every digit of every number. They're all illegal.

If this blog disappears suddenly, I've probably been got by the goons at the RIAA.

Friday, January 2, 2009

"Hit the target space" probability

Back to the problem of finding the probability of hitting a desirable space when rolling a six sided die (d6). Let's look at the easy way to compute the probabilities, then we can talk about them.

First, let's compute a couple of easy probabilities - we'll use them to check our work in a moment. If I was one space from the target, I only hit the target on a roll of 1. Any other roll takes me past it and I miss it.

So the probability I hit the target from one space away is 1/6.

If I was two spaces away, I hit the target with a roll of 2 (probability 1/6); but I can also hit the target by moving one space (1/6) and then hitting the target from there (times the 1/6 we already worked out a second ago). Total probability = 7/36.

Let's say P(i) is the probability of hitting the target space, T, from space i.

From the current space, i, the probability you hit the target is the probability you roll 1 times the probability you hit the target from one space further ahead, plus the probability you roll 2 and hit the target from 2 spaces further ahead, and so on up to 6 spaces.

That is, we get the recursive formula
P(i) = 1/6 [P(i+1)+P(i+2)+...+P(i+6)].

Let's look at that for a second. If I know the six probabilities immediately ahead of the current space, I can work out the probability from this space.

Now, let's look at the target space, T. Obviously, we're there already, so P(T)=1.

How would I work out the probability for the previous space?
P(T-1) = 1/6 [P(T) + P(T+1) + ... + P(T+5)].

Wait. What's P(T+1)? Well, from past the space I have no chance. It's 0. Same with P(T+2), etc.

P(T-1) = 1/6 [ 1 + 0 + 0 + 0 + 0 + 0 ] = 1/6 (checks out)
P(T-2) = 1/6 [1/6+ 1 + 0 + 0 + 0 + 0 ] = 7/36 (checks out)

[It's not hard from this to show that, for the 5 spaces immediately before the target, the hit-the-target probability for the NEXT space further away is 7/6 of the probability for the current space. Consequently, for those spaces, P(T-i) = 7i-1/6i. So from 6 spaces away, the probability is 75/66 = 16,807/46,656. However, I'm going to take a lazier-but-more general approach for computing probabilities.]

We can compute all the probabilities numerically, working one by one back from the target. Indeed, with a neat recursive formula like that one above, a spreadhseet is a handy tool (I happened to use Excel). Write the probabilities for P(T) to P(T+5) in a column (1 and 5 0's), and write the formula for P(T-1) in terms of those values:


4 T-() prob
5 -5 0
6 -4 0
7 -3 0
8 -2 0
9 -1 0
10 0 1
11 1 =AVERAGE(B5:B10)
12 2


Okay, instead of 1/6*sum(B5:B10) I lazily used the average function, but you see the idea. Copy that formula and paste it down as many rows as you like.


4 T-() prob
5 -5 0
6 -4 0
7 -3 0
8 -2 0
9 -1 0
10 0 1
11 1 0.16667
12 2 0.19444
13 3 0.22685
14 4 0.26466
15 5 0.30877
16 6 0.36023
17 7 0.25360
18 8 0.26809
19 9 0.28037
20 10 0.28929
21 11 0.29339
While the probability at 6 spaces away is 36%, the probability at 7 spaces away is 25%!

At large distances, the number converges to 2/7 ~= 28.57% (if the average roll is 3.5 spaces, it's perhaps not so surprising that the average probability you hit a space is 1/3.5).

In Excel, you can do neat plots to see what's going on:

In short, if you have some influence on where you try to land on the target from (such as the ability to change your roll by one, for example), the best spots (in order) are 6 spaces away, 5 spaces away and 11 spaces away, and of those, 6 is by far the best. Try to avoid being fewer than 4 spaces away, and 7 isn't much good either.

I used the approach outlined here to look at rolling two and three dice (2d6 and 3d6) as well; the direct calculation approach here adapts quite easily to these situations, yielding probabilities with a minute or so of effort.

Thursday, January 1, 2009

Let's start with a little probability intuition

Imagine you're playing a board game where your pieces move according to a die roll. Unless it's a very dull game, there will probably be other things that can influence your position as well, but we're ignoring them right now.

Let's imagine that normally your move is given by a single die roll (six sided die, d6), and you pass along the spaces only once - it's a straight "race", not a circuit.

No doubt you've seen many games where you move along spaces to some end, and a few spaces have special characteristics (such as "miss a turn" or "move ahead three spaces"). BoardGameGeek calls these "roll and move" games, and appears to list more than six thousand of them. If that doesn't ring a bell, think of something a bit like Snakes and Ladders.
(However, you can apply similar ideas to a large variety of games where you accumulate some resource with varying probabilities. )

Further, let's say there's a particularly desirable space to land on (which I'll call the target). You're currently 6 spaces away (that is, if you roll a 6 you will land on the space). You really would like to land on that space, either this turn, or next turn, or the turn after that, ... &c.

Quick - without working it out exactly, and assuming for now that no strategy is available to change the probability, what would you guess would roughly be your chance of hitting the the target at all?

Then, if you're so inclined, try to work out the probability. (If you're not inclined to attempt it, never fear - I will post the correct answer later. There's at least one fairly easy way to do it - and a lot of somewhat less easy ways to do it.)

This was one case where my intuition wasn't as directly useful as I might hope. Intuition served me just fine for figuring out what the probability would be from far away, but from closer up, not so much - at least not at first.