Wednesday, September 30, 2009

User-contributed R packages on CRAN passes 2000 mark

The number of user-contributed packages at for the free (in both senses) statistical package R has passed 2000. See http://cran.r-project.org/web/packages/.

Quite a milestone. Back in May I predicted to someone that it would hit 2000 around the start of October. Well, not too far out, but it is still September.

If you don't have R but you do a lot of manipulation of numbers, consider getting R. It repays the investment in time very well indeed.

(I have made no posts for a good while due to major illness. Mostly okay now.)

Tuesday, June 23, 2009

The product of two sums of squares

I just happened upon this extra nifty little fact - the set of sums of two squares is closed under multiplication.

That is, facts like (102 + 92) × (12 + 12) = 192 + 12 occur for every
(a2 + b2) × (c2 + d2).

Take a look - simple result, I am surprised I hadn't seen it before...

See the Brahmagupta-Fibonacci identity at Wikipedia

Sunday, June 21, 2009

Trig identites and multiplication

We're used to the idea of using logarithms to convert multiplication problems into addition problems, because of the relation:

log(xy) = log(x) + log(y)

So if you have a table of logarithms, to multiply x and y you can look up the logarithms of each, add them, then find the number in your tables whose logarithm is that sum. Three table-lookups and an addition. [Well, in practice you do the last lookup in a table of antilogarithms, but that's only a minor benefit, it's not actually necessary to the calculation.]

However, there are other ways to make multiplication "simpler" than trying to directly multiply, such as this identity:

cos x cos y = [cos (x + y) + cos (x - y)]/2

This sort of relationship was actually used to do multiplication; it's a bit more complicated than logarithms, but it works well enough if you haven't invented logs yet, but do have cosine tables. It works like this.

You want to multiply two numbers, X and Y (and let's assume that they have already been scaled to lie in a range where this will work).

You find the numbers that they are the cosines of in your tables, X = cos x, Y = cos y. Add and difference those two numbers, giving (x+y) and (x-y). Look those numbers up in your cosine tables. Average them. That's the product. Four table lookups, an addition and a halving (and some rescaling back to the original problem, presumably, which probably just involves moving a decimal point), all much easier than general multiplication.

So far so good.

But here's what I am wondering. Humans came up with a nifty device to automate that multiplication via addition of logs, the slide rule. Is it possible to build a device that does mutliplication using a trig identity like the one above, perhaps some sort of "swivel rod"?

If we do it in two parts, I think maybe we can do it with a slide rule; one side does the additions and subtraction (and halving; halving is inverse doubling, which is just addition, so it can also be done there). The other does the conversions to and from normal scale to cosine scale - but you'd be forever transferring numbers across the different scales.

So I wonder if it can all be combined in a single, simpler system, whether it's a kind of slide rule or something more complex involving actual rotations to do the cosines (accuracy may be an issue though).

I don't think a device was ever made that did this, but I think that it might be possible.

If it were, I'd love to have one.

Monday, June 15, 2009

Annoying ways to ask a question...

I recently answered some mathematics questions on Yahoo Answers.

This reminded me of some of my pet peeves when people ask for mathematical help -
many of which are apparently attempts to minimize the scale of your contribution

* "Can I ask a question? It'll only take a moment."
-- yes, questions rarely take long to ask - but the answer won't be nearly as fast.

* "This is a simple problem"
-- it often isn't. If was so darn simple, you'd answer it yourself.

* "Why are you making this so complicated? It was a simple question!"
-- see question 1. Just because the question was brief doesn't mean it has a simple solution. Fermat's last theorem could be explained to a child and stated in a few lines!


Further annoyances more particular to YA include people just posting their entire homework with not the slightest indication that they've even attempted any of it, sneering responses when you don't just give them all the answers with fully worked solutions, and snarky reactions when you point out that merely typing their question as is into google would have got them the answer instantly.

Friday, May 22, 2009

Total incomprehension?

Sometimes, adding apples and oranges is not fruitful.

Monday, May 18, 2009

Number of uniforms to sum>1

[Sorry - been a long time without posting. Illness has made it hard to find the impetus.]


Anyway, a probability problem I found interesting:

What's the distribution of the number of random uniform[0,1) random variables that you need to generate so that the sum is greater than 1?

There's a nifty little trick to it.




Update:

Well, the trick is to compute the survivor function, 1-F(n).

Let N be the number of uniform RVs needed to exceed 1.

P(N>n) = P(U(1) + U(2) + ... + U(n) <= 1)

That RHS probability is relatively easy to get in a variety of ways to be 1/n!

From there, the probability function for N is simple...

Wednesday, February 25, 2009

Calculation Information

Brendan O'Connor at, AI and Social Science has looked at a number of packages with which people might do statistical and related manipulations and computations, and compares them. Having used five of them (STATA and the items in the Python group being the exceptions, though I have had "try these out" on my agenda for some time), I agree broadly with what was said about the packages I have experience with - which suggests to me I can probably take seriously the assessment of the remainder.

A very handy and compact comparison. If you use statistics, take a look.

Wednesday, February 18, 2009

Incredibly simple ways to get rational approximations to square roots

Over at The Universe of Discourse, Mark Dominus talks about Archimedes and the square root of three pointing out that Archimedes needn't have had some mystical way of finding rational approximations to surds in order to figure that 265/153 is very close to the square root of three.

He points out that there's a simple enough pattern that Archimedes could have spotted and soon figured it out. That is true enough, but Archimedes could have got by, while being even less clever than that.

Let's consider a few simple facts:
(i) if a and b are close to the ratio √3:1 then 3b and a are also close to that ratio, with the error in the opposite direction and of similar magnitude
(ii) Hence (a+3b):(b+a) will be substantially closer* to √3:1 than a:b is

*(in fact it improves the error in the approximation of 3 by a2/b2 by a factor that rapidly goes to 2+√3, but Archimedes needn't have figured that fact out - he only needs to have noticed that 3b:a is also an approximation of √3:1, and (a+3b):(b+a) always seems to be a better approximation than he started with)

If we proceed in this fashion from the very ordinary starting point of 2:1 (and eliminate common factors as they pop up), we rapidly hit on Archimedes' ratio.


2 1 (a:b)
3 2 (3b:a)

5 3 (a+3b:b+a), which becomes our new a:b
9 5

7 4 (dividing out a common factor of 2 here)
12 7

19 11
33 19

26 15 (taking out another factor of 2)
45 26

71 41
123 71

97 56 (taking out a third factor of 2)
168 97

265 153


And you're there after maybe a couple of minutes of very simple computation, even if you're a much worse computer (in the original sense) than Archimedes doubtless was.

Not in the least mysterious, and a bonehead like me figured it out almost immediately. I'll wager a master like Archimedes realized something like this in even less time than it took me. It's so simple, he probably wouldn't have even bothered to write it down.

__________

(Later edit:)
What I have above appears to be basically the Bablyonian Method, on which Brent at The Math Less Traveled writes here.

I must have seen mention of this before (at least the phrase "Babylonian Method" sounds somewhat familiar). I have no doubt Archimedes was familiar with some version of it.

Tuesday, February 17, 2009

Nifty log-concave function post

John D Cook has a nifty post up about log-concave functions (in the latest Carnival of Mathematics). These include the log-concave densities, which have been popular objects of research in statistics in the last decade or so - they're a wide class of densities that have (as you might guess from John's article) some nice properties.

I'd like to write a longer, basic post about them sometime. The Wikipedia page doesn't really do them justice.

Monday, January 5, 2009

What mathematicians do

Imagining that mathematicians spend all their times doing calculation is very like imagining that writers spend all their time spelling words.

Some minor computer geekery

Today I managed to get R running under the PortableApps menu, with only a little fiddling. Which if you're a big computer nerd is really not much of an achievement at all, but for an old dinosaur like me, it's pretty good.

Adding R makes that particular USB stick a lot nearer to having "all the software you need on a USB" if you're a statistician.

I'm quite impressed with some of the portable applications, too.

Getting R onto the USB is trivial - you can just copy an existing R (including the subdirectories) from a hard drive, and you have all your installed packages and your saved variables and everything. But making it appear in the menu was kind of nice.

Just how small is the smallest illegal number?

I've seen a number of posts recently relating to "illegal numbers", such as these ones:

Copyrighting a number, and
Converting pi to binary, which points to
Converting Pi to binary: Don't do it!

The first of these relates to the fact that files are just essentially numbers, and replicating certain numbers can be illegal (even to DMCA issues). The third (and the second which points to it) is essentially remarking on the fact that π is thought to be a normal number, so it is believed that all possible finite digit sequences occur in its expansion (and so contains all such illegal numbers).

Let's consider the first issue: some integers are definitely illegal - it appears quite possible to get yourself into trouble posting particular integers. The post at Copyrighting a number suggests that the smallest such number is incomrehensibly large.

But I suspect that this isn't necessarily the case.

Certainly it's true that the smallest integer representing a song would be huge. The smallest integer representing a book would be far, far larger very large**. But it's not only entire songs or books that are subject to copyright. Indeed, it only takes a few bars before you get into copyright trouble. And it doesn't need to be a few bars of a perfomance. In fact, I could represent in a compact ASCII form just enough of a simple tune to get myself into trouble in only a small number of characters (I think with careful choice of tune, the extracted bars and the representation, I could do it in less than one line of text here). I could represent it in some compressed form in much less space (posting a zip file of a copyright work is just as illegal as posting an uncompressed version, and so it would be with any other compression scheme). Maybe I could get into trouble posting slightly less of a book - I can quote a tiny bit of a book in certain circumstances, but if those circumstances don't hold, how little of a book can I quote before I am in copyright violation?

** [That "far, far larger" (now elided) looks like a dumb error if you compare the size of a high fidelity musical recording with a book recorded as, say an ascii file. But that's not comparing like with like. A book is like a score (and lyrics), not like the performance of it. If we compare a performance of a book with a performance of a song, then again, the book is larger (perfomed books typically take several CDs). But arguing about this is a distraction from the point here, so I have changed the text above to avoid the issue.]

So, then, we are left to wonder, what is the smallest possible illegal digit sequence?

What it the smallest possible representation of it?

(Might the πth root of 237 have that digit sequence somewhere in its first ten thousand digits, thus giving me an incredibly compact way of specifying an illegal digit sequence? I doubt it.)

Anyway, I'm inclined to think that it might be possible to get yourself into trouble with say the DMCA with some surprisingly small digit sequences.

And wait a minute - does this post constitute an encouragement to breaking the law?
I've asked for people to tell me the smallest illegal number. Certainly, if someone does post such a number in the comments, it would be easy to argue that I incited it. But actually, it seems the law on that doesn't actually require the other person to break the law. And even if it's not enough for me just to ask what the number is, I don't want any of my readers going to jail if they do figure out the number.


One less

Perhaps it's better to ask what number is one less than the smallest illegal number. Maybe that's safer.

But consider the RIAA - I don't expect that the RIAA would have the slightest hesitation in trying to smack me down for publishing one less than the smallest illegal number because that description of it implies an algorithm for producing the smallest illegal number from it... and so this very paragraph contains an algorithm for producing an illegal number, were I to later post one less than such a number. (In fact, is it illegal to describe such an algorithm already, without having a number to apply it to? The DMCA seems to be pretty broad. Have I in fact already broken the law merely by describing the algorithm - that "the way to produce the smallest illegal integer from the next smallest number is simply to add 1"?)

So asking for one less than an illegal number is no good. Certainly publishing that number with the description ("just add one") is illegal. But if I publish a number and an algorithm for producing an illegal number from it in two different places, I believe I have still broken the law (they don't have to be in the same location, surely, or the law would be trivially circumvented). Given I've already posted an algorithm for producing an illegal number from one less than it, it seems that posting one less than the smallest illegal number anywhere must in turn be illegal.

No doubt you've noticed the problem already.

By that reasoning, one less than that number must also be illegal.

Assuming we restrict our search to the non-negative integers, I believe I can therefore assert that the smallest illegal number I can publish to my blog is zero.

Oops - and just I made the mistake of publishing it to my blog.

Numbers. Every digit of every number. They're all illegal.

If this blog disappears suddenly, I've probably been got by the goons at the RIAA.

Friday, January 2, 2009

"Hit the target space" probability

Back to the problem of finding the probability of hitting a desirable space when rolling a six sided die (d6). Let's look at the easy way to compute the probabilities, then we can talk about them.

First, let's compute a couple of easy probabilities - we'll use them to check our work in a moment. If I was one space from the target, I only hit the target on a roll of 1. Any other roll takes me past it and I miss it.

So the probability I hit the target from one space away is 1/6.

If I was two spaces away, I hit the target with a roll of 2 (probability 1/6); but I can also hit the target by moving one space (1/6) and then hitting the target from there (times the 1/6 we already worked out a second ago). Total probability = 7/36.

Let's say P(i) is the probability of hitting the target space, T, from space i.

From the current space, i, the probability you hit the target is the probability you roll 1 times the probability you hit the target from one space further ahead, plus the probability you roll 2 and hit the target from 2 spaces further ahead, and so on up to 6 spaces.

That is, we get the recursive formula
P(i) = 1/6 [P(i+1)+P(i+2)+...+P(i+6)].

Let's look at that for a second. If I know the six probabilities immediately ahead of the current space, I can work out the probability from this space.

Now, let's look at the target space, T. Obviously, we're there already, so P(T)=1.

How would I work out the probability for the previous space?
P(T-1) = 1/6 [P(T) + P(T+1) + ... + P(T+5)].

Wait. What's P(T+1)? Well, from past the space I have no chance. It's 0. Same with P(T+2), etc.

P(T-1) = 1/6 [ 1 + 0 + 0 + 0 + 0 + 0 ] = 1/6 (checks out)
P(T-2) = 1/6 [1/6+ 1 + 0 + 0 + 0 + 0 ] = 7/36 (checks out)

[It's not hard from this to show that, for the 5 spaces immediately before the target, the hit-the-target probability for the NEXT space further away is 7/6 of the probability for the current space. Consequently, for those spaces, P(T-i) = 7i-1/6i. So from 6 spaces away, the probability is 75/66 = 16,807/46,656. However, I'm going to take a lazier-but-more general approach for computing probabilities.]

We can compute all the probabilities numerically, working one by one back from the target. Indeed, with a neat recursive formula like that one above, a spreadhseet is a handy tool (I happened to use Excel). Write the probabilities for P(T) to P(T+5) in a column (1 and 5 0's), and write the formula for P(T-1) in terms of those values:


A B
:

4 T-() prob
5 -5 0
6 -4 0
7 -3 0
8 -2 0
9 -1 0
10 0 1
11 1 =AVERAGE(B5:B10)
12 2
13

:


Okay, instead of 1/6*sum(B5:B10) I lazily used the average function, but you see the idea. Copy that formula and paste it down as many rows as you like.



A B
:

4 T-() prob
5 -5 0
6 -4 0
7 -3 0
8 -2 0
9 -1 0
10 0 1
11 1 0.16667
12 2 0.19444
13 3 0.22685
14 4 0.26466
15 5 0.30877
16 6 0.36023
17 7 0.25360
18 8 0.26809
19 9 0.28037
20 10 0.28929
21 11 0.29339
While the probability at 6 spaces away is 36%, the probability at 7 spaces away is 25%!

At large distances, the number converges to 2/7 ~= 28.57% (if the average roll is 3.5 spaces, it's perhaps not so surprising that the average probability you hit a space is 1/3.5).

In Excel, you can do neat plots to see what's going on:



In short, if you have some influence on where you try to land on the target from (such as the ability to change your roll by one, for example), the best spots (in order) are 6 spaces away, 5 spaces away and 11 spaces away, and of those, 6 is by far the best. Try to avoid being fewer than 4 spaces away, and 7 isn't much good either.

I used the approach outlined here to look at rolling two and three dice (2d6 and 3d6) as well; the direct calculation approach here adapts quite easily to these situations, yielding probabilities with a minute or so of effort.

Thursday, January 1, 2009

Let's start with a little probability intuition

Imagine you're playing a board game where your pieces move according to a die roll. Unless it's a very dull game, there will probably be other things that can influence your position as well, but we're ignoring them right now.

Let's imagine that normally your move is given by a single die roll (six sided die, d6), and you pass along the spaces only once - it's a straight "race", not a circuit.

No doubt you've seen many games where you move along spaces to some end, and a few spaces have special characteristics (such as "miss a turn" or "move ahead three spaces"). BoardGameGeek calls these "roll and move" games, and appears to list more than six thousand of them. If that doesn't ring a bell, think of something a bit like Snakes and Ladders.
(However, you can apply similar ideas to a large variety of games where you accumulate some resource with varying probabilities. )

Further, let's say there's a particularly desirable space to land on (which I'll call the target). You're currently 6 spaces away (that is, if you roll a 6 you will land on the space). You really would like to land on that space, either this turn, or next turn, or the turn after that, ... &c.

Quick - without working it out exactly, and assuming for now that no strategy is available to change the probability, what would you guess would roughly be your chance of hitting the the target at all?

Then, if you're so inclined, try to work out the probability. (If you're not inclined to attempt it, never fear - I will post the correct answer later. There's at least one fairly easy way to do it - and a lot of somewhat less easy ways to do it.)

This was one case where my intuition wasn't as directly useful as I might hope. Intuition served me just fine for figuring out what the probability would be from far away, but from closer up, not so much - at least not at first.