Friday, September 2, 2011

Algorithm-a-Day : Day 16 : Conway's Game of Life

Alright! I finished moving and my new apartment is mostly clean,
so it's back to AaD! And today is...

The Game of Life. A beautiful and wonderful no-player game.
Grab yourself some pygame, then give this a run:

Github :: Conway's Game of Life

Full notes in the repo!

Thursday, August 18, 2011

Moving Next Week

As I'm getting ready to move next week, I have less time lately to work on AaD. I'm therefore going to announce a slow-down, but not a cancellation. AaD will continue, just once I have everything settled down in my new place.

It's got an extra room for the same rent as the place I'm in now, and that's saying something for Japan.

Wish me luck! (or hard work)

Tuesday, August 16, 2011

Algorithm-a-Day : Day 15 : Email Address Parsing with Regex

Of course, inspired by: XKCD #208
And on that note: XKCD4ME (Python script)
 
ABOUT REGULAR EXPRESSIONS
-------------------------
  Regular expressions are a very powerful tool for finding and replacing
substring patterns in strings. Regex's are a language of their own,
separate from anything we've been using so far, but luckily they are mostly
uniform across programming languages.
(Find that prime solving regex)
PYTHON
------
  The regex we're given is:
    \b([a-zA-Z0-9-_.]+)@(\w+[.][a-zA-Z.]+)
but more specifically in a Pythonic context we can look at it directly as:
    r"\b([a-zA-Z0-9-_.]+)@(\w+[.][a-zA-Z.]+)"
  The r"" type of string is called a raw string, and it doesn't accept
special characters. The string r"haha this won't \n print a new line"
will print:
    haha this won't \n print a new line
Essentially, backslashes are not interpreted. This is important for regex's
because backslashes also have special meaning, which means that in a regular
string you would have to double-escape your special characters. We won't
bother with that, and just go ahead and use raw strings.
  See line 6. Holy moon language. Note that we could have written the regex
like this:
    regex = r"""\b
([a-zA-Z0-9-_.]+)
@
(\w
[.]
[a-zA-Z.]+)
"""
and then called it like this:
    pattern = re.compile(regex, 'VERBOSE')
Verbose mode allows whitespace and newlines to be ignored in the given regex.
This lets us break up longer regex's into more digestable chunks.
We can also comment them, like this:
    regex = r"""\b # Match the start of a word.
([a-zA-Z0-9-_.]+) # Match one or more characters from small
# 'a' to 'z', capital 'A' to 'Z', and
# a few special characters.
@ # Match the 'at symbol' (or 'amphere').
(\w # Match any letter or digit.
[.] # Match a period. This has to be in a []!
[a-zA-Z.]+) # Match one or more letter or period.
"""
The last condition is important because some URLs from other countries have
multiple extentions to them. For example, a Yahoo email address from Japan
would end in: @yahoo.co.jp
  The rest is straight forward; we compile the regex, ask it for matches,
then store said matches after some string working.
  Try poking around at the regex itself to see what makes the script
explode and what doesn't. Then, try making one that parses URLs. Maybe to run
across an HTML file? Your life as a professional scraper begins today!

Monday, August 15, 2011

Algorithm-a-Day : Day 14 : Heap Sort

If you already have a Heap, this is the easiest sorting algorithm to code. Ever.

Github :: Heap Sort

ABOUT HEAPSORT
--------------
  Heap sort is one of the most conceptually simple sorts out there - that is,
if you already have a Heap. Simply grab the first (the maximum value!)
element, store it, perform head-deletion, repeat.
This process is demonstrated most elegantly in the Haskell version.
For the C and Python versions, however, instead of using the head
removal function out-right, our implementations are going to go about things
just a little differently.
C
-
  The C and Python version share a quality: they trick the Heap
into thinking it's shorter than it is, then hiding the gradually sorted
values at the end of the Heap instead. In doing this, we are able to
perform many cascade-downs without disturbing those sorted values
that we accumulate at the end.
  For the sake of continuity, this version returns us the Heap's
inner array wrapped in an IntList. When doing a HeapSort in place,
the original Heap is essentially ruined, so this shouldn't be a problem.
PYTHON
------
  Nice and short, and essentially identical to the C version.
It's still running about 3 times slower than a CombSort11, which
is a bit pathetic. Unlike the C version (as has been the case with
all of our data structures so far) Python can handle any type, not just
ints.
HASKELL
-------
  At the moment, heapRemove is so inefficient that heapSort has trouble
sorting a list of only a few hundred elements. It's not the sort's fault,
the underlying algorithm just sucks.
  In the Heap Sort algorithm itself, however, we see Haskell's
powerful list creation functionality hard at work.

Sunday, August 14, 2011

Algorithm-a-Day : Day 13 : (Binary Max) Heap

Not the memory Heap! This is a tree based structure that a lot of fancy
things are built with :).

Github :: Heap

ABOUT HEAPS
-----------
  Heaps are essentially trees that satisfy something called "The Heap
Property". This property states that for every parent node,
its two (or more) children will be less than it (by whatever method
of comparison you're using). The order of the children doesn't matter.
Nor does is matter if a parent contents children greater than that
parent's sibling node.
  Heaps bless us with an interesting property: the children are always
an _inductable_ distance away from their parent. The root, being the first
parent, will always have children at Nodes 2 and 3. Likewise the 4th Node
will belong to root's first child, etc.
  This pattern allows us to bypass implementing a tree completely,
and instead store the Heap as a list/array, simply inducing the indexes
of the child/parent nodes whenever we need them.
INDEX ARITHMATIC
----------------
  For a given node at index i:
    - children @ a[2i+1] and a[2i+2]
    - parent @ a[floor((i−1)/2)]
WHAT HEAPS DO
-------------
  Heaps need to be able to do a few things:
    - Insert and rebalance.
    - Delete the root and rebalance.
    - Yield the max (or min) value present.
    - Merge with another heap.
INSERTION
---------
  New values are inserted at the end of the Heap. This is trivial if the
Heap is implemented as an array internally. We must then 'cascade-up',
swapping the new child with its parent until the Heap Property is restored.
  At most this would only ever take as many operations as there are
layers in the Heap.
ROOT DELETION
-------------
  When removing the root, take the last leaf and make it the root.
Then 'cascade-down', always swapping the current leaf with its greater child
until again, the Heap Property is restored.
YIELDING THE MAX/MIN
--------------------
  While this isn't the case for all types of Heaps, Binary Max/Min heaps
can complete this in a single operation.
MERGING HEAPS
-------------
  Simply yield one to the other, inserting each value and cascading as
necessary.
WHAT HEAPS ARE USED FOR
-----------------------
  Heaps are used in a particular sorting algorithm called Heap Sort,
as well in graphing algorithms like Dijkstra's Algorithm.
IMPLEMENTATIONS
---------------
C
-
  As an experiment, I wrote this version in the Linux Kernel style of
C coding. Turned out pretty neat, if you don't mind 8-space tabs.
  This does everything we promised Heaps have to do: insertion, deletion,
finding the max, and merging. It also prints itself out _like a tree_,
just like Binary Search Tree from before.
  I went all-out with error checking too, and just hope I did it
sensibly.
PYTHON
------
  Less than half the length of the C version... who'da thought? Python gives
us a lot of power here. Merging heaps is as simple as inserting all
the elements of a given iterator. It doesn't even have to be a heap ^^!
  Again, lots of power with __iter__ and __str__.
  You may notice that cascade_down() is quite different between the C
and Python versions. The Python version uses a more general approach to
determine the greater child (and its index) and actually allows us to
have more than 2 children in a future implementation. The C version doesn't
even begin to allow this.
HASKELL
-------
  As lists in Haskell are Linked Lists and not random access arrays,
implementing a Heap with a list would be a bit foolish. Instead,
we go ahead and create our own data type, much like we did with
the Binary Tree. Essentially by changing how new items are added did we
make this a Heap. Notice that this Heap is foldable just like the BS Tree was,
and that this Heap will print out properly in a Tree shape, just like the C
and Python versions.
  I admit that heapInsert and cascade are so close to each other that it's
a bit disturbing. I'm sure the solution is obvious, and that it just hasn't
dawned on me yet.
  Removing the root in this version is also less complicated (and slower).
We simply ignore the root and heapMerge the two branches together.
  listToHeap isn't really necessary, but it helps out with testing.

Friday, August 12, 2011

Algorithm-a-Day : Day 12 : Binary Search Tree

Our first tree structure. Hurray!
Trees are important because a lot of things are implemented with them. Not of all of them act like BS Trees, but it's easy so we're starting with it.
Also, make sure to check out the section below on how Trees are printed. This causes a lot of people trouble, and I found a good solution.

Github :: Binary Search Tree

ABOUT BINARY SEARCH TREE
------------------------
  A handy tree structure built for fast look-ups!
Our versions here today aren't self-balancing, meaning that if your root
sucks you could end up with a very lopsided tree.
Many built-in data structures in modern languages are implemented with trees,
so it's importaant to know what they are and how they work.
This included knowing how the following things are done:
-- Recursion.
-- Queues.
-- Depth-first searches.
-- Breadth-first searches.
(But for a binary search tree we don't have to worry about those types
of searches)
ABOUT PRINTING TREE STRUCTURES
------------------------------
  This causes so much unnecessary suffering, so I created a slick recursive
way to print out tree structures opening out to the right. Each leaf gets
its own line, and everything is indented properly. It's quite easy to read.
Check out both the py_test.py file and the C unit test to see what
I'm talking about.
C
-
  Here we had to introduce some new structs. Allocating memory for them is
standard run-of-the-mill stuff. Don't forget to check for NULL pointers
everywhere possible. I'll be there with an "I told you so" when a Segfault
cracks your computer in half.
  Insertion and checking for containment is all done with recursion. These
things can be done iteratively too, but it can get silly.
  Printing here, unlike the Python and Haskell versions, is done without
any string creation, and instead prints right in the functions itself.
  There is a **unit test** file, so make sure to check that out too.
PYTHON
------
  Shorter and sweeter than the C version, but all around the same.
Just as we've had in other Python versions of algorithms thus far, we get
to specify 'special functions' that allow us to use keywords on our Tree.
    print(tree)
    if 5 in tree: ...
    etc.
  For printing, here we're gradually building up a list of strings of the
leaves, then concatinating it all at the end in the __str__ function.
Remember that in Python, str.join() is always better than str + str.
HASKELL
-------
  God functional programming rocks. Here we start by defining instances
of typeclasses for the Tree type, allowing us to (via Functor) map functions
across our Tree, and (via Foldable) use folds across it as well.
  While it may look a little different, the function showTree produces
the same result as the print functions in the C and Python versions. Once
again, a victory for recursion. showTree is called by 'show', as is standard.
  Everything else is essentially the same, but with the Haskell twist.
Even so, its over half as short as the C version (and that's with all our
instance definitions included!).
  In conclusion, Haskell continues to impress me with its power of
expressiveness.

Thursday, August 11, 2011

Algorithm-a-Day : Day 11 : Circular Binary Search

As far as I know, I invented this. It's a binary search algorithm that works on lists that are sorted, but not linearly. For example, this algorithm can determine containment in the following list:
[4,5,6,7,8,9,0,1,2,3]
without any restructuring of the list. This is handy when searching large batches of data like log files, where sorting beforehand isn't an option.

Github :: Circular Binary Search

Check the notes in the repo, it's a good read.