Saturday, May 14, 2011

The Beauty of Infinite Data Structures

While a similar concept is possible in Python (and other languages) using iterators or generators,
there is something about infinite data structures in Haskell that I find very attractive.
I'll introduce a few here to demonstrate how pretty they can be.

** Prime Numbers **
Every self-respecting programmer needs a prime number generator/solver
laying around in their coding tool box.
Assume we have a function 'next_prime' that tells us what
the next higher prime is relative to the one given, how could we
implement a generator in Python to yield a set of prime numbers?

Well...

def prime_gen(limit):
    '''Mmm. Primes.'''
    prime = 2
    for x in range(limit):
        yield prime
        prime = next_prime(prime)

Sure, that ought to do the trick. Now just add some of this:

>>> list(prime_gen(10))

into a script or the interpreter, and we get:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

Great! Primes. We got them safe and sound, in probably one of the shortest
ways we could have. And there's nothing bad to be said about that, but
functional programming simply gives us a more elegant way.

primes = 2 : 3 : filter isPrime [5,7..]

This is assuming we have a function 'isPrime' of type:

isPrime :: Integral a => a -> Bool

which of course tells us if a given number if prime or not.

A simple:

take 10 primes

thus gives the same result as the Python code above. There are, however,
a few things to notice. 'primes' itself is a list. An infinite one, at that.
We declare it as an expression, which can be read as plain English:

We define primes to be a list, made up of 2, 3, and all the prime numbers
to be found among the odd natural numbers.

We could of course have written: primes = 2 : filter isPrime [3,5..]
but personally, I didn't want 2 to be lonely. He's the only even prime.
How would you feel if you were the only exception in an infinite set of
numbers? I know his pain... I'm a white guy living in Japan. So I gave him a friend.

Anyway...

Our list expression is an agreement with Haskell; it tells it what
a list of all primes would look like, were it to exist. Haskell, being lazy,
doesn't calculate all the values (it couldn't even), and so we are able
to define lists like this, skipping the idea of an intermediate generator
like in the Python code above.


** Fibonacci Numbers **
Next, we'll look at a nice equation for a list of all the fibonacci numbers:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Here's a fibonacci iterator implemented in Python.

This expression is different from the one above, in that it references
itself in its own creation. If you're unfamiliar with what zipWith does,just check this out.

To get a feel for how this expression works, let's see what would happen
if we did, say: take 5 fibs

take 5 [0,1,??]  -- Fibs doesn't exist yet, but we know its first two values.
0 : take 4 [1,??]  -- Haskell looked at our definition, and took the 0.
0 : 1 : take 3 [??]  -- None of the rest of the list has been calculated yet.
0 : 1 : (0 + 1) : take 2 [??]  -- zipWith in action! Element 0 of fibs plus element 0 of tail fibs.
0 : 1 : 1 : (1 + 1) : take 1 [??]  -- zipWith takes another step.
0 : 1 : 1 : 2 : (1 + 2) : take 0 [??]  -- And another.
0 : 1 : 1 : 2 : 3 : []  -- Done!
[0,1,1,2,3]  -- The return value.

My mind really blew the first time I was shown this. Never would I have thought
to define a list in terms of itself. I was still in procedural mode then,
but this really opened my eyes. Procedural languages give us "algorithms
which calculate the fibonacci numbers" while functional languages give us
the freedom to define infinite data structures and say:
"These ARE the fibonacci numbers".

I think it's an important distinction.

Note to any Python programmers: Any better ways to generate the primes?
I'd love to see other solutions.

No comments:

Post a Comment