Alright! I finished moving and my new apartment is mostly clean,
so it's back to AaD! And today is...
The Game of Life. A beautiful and wonderful no-player game.
Grab yourself some pygame, then give this a run:
Github :: Conway's Game of Life
Full notes in the repo!
Code on the Mind
Writings about code, science, language, and life.
Friday, September 2, 2011
Thursday, August 18, 2011
Moving Next Week
As I'm getting ready to move next week, I have less time lately to work on AaD. I'm therefore going to announce a slow-down, but not a cancellation. AaD will continue, just once I have everything settled down in my new place.
It's got an extra room for the same rent as the place I'm in now, and that's saying something for Japan.
Wish me luck! (or hard work)
It's got an extra room for the same rent as the place I'm in now, and that's saying something for Japan.
Wish me luck! (or hard work)
Tuesday, August 16, 2011
Algorithm-a-Day : Day 15 : Email Address Parsing with Regex
Of course, inspired by: XKCD #208And on that note: XKCD4ME (Python script)ABOUT REGULAR EXPRESSIONS-------------------------Regular expressions are a very powerful tool for finding and replacingsubstring patterns in strings. Regex's are a language of their own,separate from anything we've been using so far, but luckily they are mostlyuniform across programming languages.(Find that prime solving regex)PYTHON------The regex we're given is:\b([a-zA-Z0-9-_.]+)@(\w+[.][a-zA-Z.]+)but more specifically in a Pythonic context we can look at it directly as:r"\b([a-zA-Z0-9-_.]+)@(\w+[.][a-zA-Z.]+)"The r"" type of string is called a raw string, and it doesn't acceptspecial characters. The string r"haha this won't \n print a new line"will print:haha this won't \n print a new lineEssentially, backslashes are not interpreted. This is important for regex'sbecause backslashes also have special meaning, which means that in a regularstring you would have to double-escape your special characters. We won'tbother with that, and just go ahead and use raw strings.See line 6. Holy moon language. Note that we could have written the regexlike this:regex = r"""\b([a-zA-Z0-9-_.]+)@(\w[.][a-zA-Z.]+)"""and then called it like this:pattern = re.compile(regex, 'VERBOSE')Verbose mode allows whitespace and newlines to be ignored in the given regex.This lets us break up longer regex's into more digestable chunks.We can also comment them, like this:regex = r"""\b # Match the start of a word.([a-zA-Z0-9-_.]+) # Match one or more characters from small# 'a' to 'z', capital 'A' to 'Z', and# a few special characters.@ # Match the 'at symbol' (or 'amphere').(\w # Match any letter or digit.[.] # Match a period. This has to be in a []![a-zA-Z.]+) # Match one or more letter or period."""The last condition is important because some URLs from other countries havemultiple extentions to them. For example, a Yahoo email address from Japanwould end in: @yahoo.co.jpThe rest is straight forward; we compile the regex, ask it for matches,then store said matches after some string working.Try poking around at the regex itself to see what makes the scriptexplode and what doesn't. Then, try making one that parses URLs. Maybe to runacross an HTML file? Your life as a professional scraper begins today!
Monday, August 15, 2011
Algorithm-a-Day : Day 14 : Heap Sort
If you already have a Heap, this is the easiest sorting algorithm to code. Ever.
Github :: Heap Sort
Github :: Heap Sort
ABOUT HEAPSORT--------------Heap sort is one of the most conceptually simple sorts out there - that is,if you already have a Heap. Simply grab the first (the maximum value!)element, store it, perform head-deletion, repeat.This process is demonstrated most elegantly in the Haskell version.For the C and Python versions, however, instead of using the headremoval function out-right, our implementations are going to go about thingsjust a little differently.C-The C and Python version share a quality: they trick the Heapinto thinking it's shorter than it is, then hiding the gradually sortedvalues at the end of the Heap instead. In doing this, we are able toperform many cascade-downs without disturbing those sorted valuesthat we accumulate at the end.For the sake of continuity, this version returns us the Heap'sinner array wrapped in an IntList. When doing a HeapSort in place,the original Heap is essentially ruined, so this shouldn't be a problem.PYTHON------Nice and short, and essentially identical to the C version.It's still running about 3 times slower than a CombSort11, whichis a bit pathetic. Unlike the C version (as has been the case withall of our data structures so far) Python can handle any type, not justints.HASKELL-------At the moment, heapRemove is so inefficient that heapSort has troublesorting a list of only a few hundred elements. It's not the sort's fault,the underlying algorithm just sucks.In the Heap Sort algorithm itself, however, we see Haskell'spowerful list creation functionality hard at work.
Sunday, August 14, 2011
Algorithm-a-Day : Day 13 : (Binary Max) Heap
Not the memory Heap! This is a tree based structure that a lot of fancy
things are built with :).
Github :: Heap
things are built with :).
Github :: Heap
ABOUT HEAPS-----------Heaps are essentially trees that satisfy something called "The HeapProperty". This property states that for every parent node,its two (or more) children will be less than it (by whatever methodof comparison you're using). The order of the children doesn't matter.Nor does is matter if a parent contents children greater than thatparent's sibling node.Heaps bless us with an interesting property: the children are alwaysan _inductable_ distance away from their parent. The root, being the firstparent, will always have children at Nodes 2 and 3. Likewise the 4th Nodewill belong to root's first child, etc.This pattern allows us to bypass implementing a tree completely,and instead store the Heap as a list/array, simply inducing the indexesof the child/parent nodes whenever we need them.INDEX ARITHMATIC----------------For a given node at index i:- children @ a[2i+1] and a[2i+2]- parent @ a[floor((i−1)/2)]WHAT HEAPS DO-------------Heaps need to be able to do a few things:- Insert and rebalance.- Delete the root and rebalance.- Yield the max (or min) value present.- Merge with another heap.INSERTION---------New values are inserted at the end of the Heap. This is trivial if theHeap is implemented as an array internally. We must then 'cascade-up',swapping the new child with its parent until the Heap Property is restored.At most this would only ever take as many operations as there arelayers in the Heap.ROOT DELETION-------------When removing the root, take the last leaf and make it the root.Then 'cascade-down', always swapping the current leaf with its greater childuntil again, the Heap Property is restored.YIELDING THE MAX/MIN--------------------While this isn't the case for all types of Heaps, Binary Max/Min heapscan complete this in a single operation.MERGING HEAPS-------------Simply yield one to the other, inserting each value and cascading asnecessary.WHAT HEAPS ARE USED FOR-----------------------Heaps are used in a particular sorting algorithm called Heap Sort,as well in graphing algorithms like Dijkstra's Algorithm.IMPLEMENTATIONS---------------C-As an experiment, I wrote this version in the Linux Kernel style ofC coding. Turned out pretty neat, if you don't mind 8-space tabs.This does everything we promised Heaps have to do: insertion, deletion,finding the max, and merging. It also prints itself out _like a tree_,just like Binary Search Tree from before.I went all-out with error checking too, and just hope I did itsensibly.PYTHON------Less than half the length of the C version... who'da thought? Python givesus a lot of power here. Merging heaps is as simple as inserting allthe elements of a given iterator. It doesn't even have to be a heap ^^!Again, lots of power with __iter__ and __str__.You may notice that cascade_down() is quite different between the Cand Python versions. The Python version uses a more general approach todetermine the greater child (and its index) and actually allows us tohave more than 2 children in a future implementation. The C version doesn'teven begin to allow this.HASKELL-------As lists in Haskell are Linked Lists and not random access arrays,implementing a Heap with a list would be a bit foolish. Instead,we go ahead and create our own data type, much like we did withthe Binary Tree. Essentially by changing how new items are added did wemake this a Heap. Notice that this Heap is foldable just like the BS Tree was,and that this Heap will print out properly in a Tree shape, just like the Cand Python versions.I admit that heapInsert and cascade are so close to each other that it'sa bit disturbing. I'm sure the solution is obvious, and that it just hasn'tdawned on me yet.Removing the root in this version is also less complicated (and slower).We simply ignore the root and heapMerge the two branches together.listToHeap isn't really necessary, but it helps out with testing.
Friday, August 12, 2011
Algorithm-a-Day : Day 12 : Binary Search Tree
Our first tree structure. Hurray!
Trees are important because a lot of things are implemented with them. Not of all of them act like BS Trees, but it's easy so we're starting with it.
Also, make sure to check out the section below on how Trees are printed. This causes a lot of people trouble, and I found a good solution.
Github :: Binary Search Tree
Trees are important because a lot of things are implemented with them. Not of all of them act like BS Trees, but it's easy so we're starting with it.
Also, make sure to check out the section below on how Trees are printed. This causes a lot of people trouble, and I found a good solution.
Github :: Binary Search Tree
ABOUT BINARY SEARCH TREE------------------------A handy tree structure built for fast look-ups!Our versions here today aren't self-balancing, meaning that if your rootsucks you could end up with a very lopsided tree.Many built-in data structures in modern languages are implemented with trees,so it's importaant to know what they are and how they work.This included knowing how the following things are done:-- Recursion.-- Queues.-- Depth-first searches.-- Breadth-first searches.(But for a binary search tree we don't have to worry about those typesof searches)ABOUT PRINTING TREE STRUCTURES------------------------------This causes so much unnecessary suffering, so I created a slick recursiveway to print out tree structures opening out to the right. Each leaf getsits own line, and everything is indented properly. It's quite easy to read.Check out both the py_test.py file and the C unit test to see whatI'm talking about.C-Here we had to introduce some new structs. Allocating memory for them isstandard run-of-the-mill stuff. Don't forget to check for NULL pointerseverywhere possible. I'll be there with an "I told you so" when a Segfaultcracks your computer in half.Insertion and checking for containment is all done with recursion. Thesethings can be done iteratively too, but it can get silly.Printing here, unlike the Python and Haskell versions, is done withoutany string creation, and instead prints right in the functions itself.There is a **unit test** file, so make sure to check that out too.PYTHON------Shorter and sweeter than the C version, but all around the same.Just as we've had in other Python versions of algorithms thus far, we getto specify 'special functions' that allow us to use keywords on our Tree.print(tree)if 5 in tree: ...etc.For printing, here we're gradually building up a list of strings of theleaves, then concatinating it all at the end in the __str__ function.Remember that in Python, str.join() is always better than str + str.HASKELL-------God functional programming rocks. Here we start by defining instancesof typeclasses for the Tree type, allowing us to (via Functor) map functionsacross our Tree, and (via Foldable) use folds across it as well.While it may look a little different, the function showTree producesthe same result as the print functions in the C and Python versions. Onceagain, a victory for recursion. showTree is called by 'show', as is standard.Everything else is essentially the same, but with the Haskell twist.Even so, its over half as short as the C version (and that's with all ourinstance definitions included!).In conclusion, Haskell continues to impress me with its power ofexpressiveness.
Thursday, August 11, 2011
Algorithm-a-Day : Day 11 : Circular Binary Search
As far as I know, I invented this. It's a binary search algorithm that works on lists that are sorted, but not linearly. For example, this algorithm can determine containment in the following list:
[4,5,6,7,8,9,0,1,2,3]
without any restructuring of the list. This is handy when searching large batches of data like log files, where sorting beforehand isn't an option.
Github :: Circular Binary Search
Check the notes in the repo, it's a good read.
[4,5,6,7,8,9,0,1,2,3]
without any restructuring of the list. This is handy when searching large batches of data like log files, where sorting beforehand isn't an option.
Github :: Circular Binary Search
Check the notes in the repo, it's a good read.
Subscribe to:
Posts (Atom)