As I'm getting ready to move next week, I have less time lately to work on AaD. I'm therefore going to announce a slow-down, but not a cancellation. AaD will continue, just once I have everything settled down in my new place.
It's got an extra room for the same rent as the place I'm in now, and that's saying something for Japan.
Wish me luck! (or hard work)
Thursday, August 18, 2011
Tuesday, August 16, 2011
Algorithm-a-Day : Day 15 : Email Address Parsing with Regex
Of course, inspired by: XKCD #208And on that note: XKCD4ME (Python script)ABOUT REGULAR EXPRESSIONS-------------------------Regular expressions are a very powerful tool for finding and replacingsubstring patterns in strings. Regex's are a language of their own,separate from anything we've been using so far, but luckily they are mostlyuniform across programming languages.(Find that prime solving regex)PYTHON------The regex we're given is:\b([a-zA-Z0-9-_.]+)@(\w+[.][a-zA-Z.]+)but more specifically in a Pythonic context we can look at it directly as:r"\b([a-zA-Z0-9-_.]+)@(\w+[.][a-zA-Z.]+)"The r"" type of string is called a raw string, and it doesn't acceptspecial characters. The string r"haha this won't \n print a new line"will print:haha this won't \n print a new lineEssentially, backslashes are not interpreted. This is important for regex'sbecause backslashes also have special meaning, which means that in a regularstring you would have to double-escape your special characters. We won'tbother with that, and just go ahead and use raw strings.See line 6. Holy moon language. Note that we could have written the regexlike this:regex = r"""\b([a-zA-Z0-9-_.]+)@(\w[.][a-zA-Z.]+)"""and then called it like this:pattern = re.compile(regex, 'VERBOSE')Verbose mode allows whitespace and newlines to be ignored in the given regex.This lets us break up longer regex's into more digestable chunks.We can also comment them, like this:regex = r"""\b # Match the start of a word.([a-zA-Z0-9-_.]+) # Match one or more characters from small# 'a' to 'z', capital 'A' to 'Z', and# a few special characters.@ # Match the 'at symbol' (or 'amphere').(\w # Match any letter or digit.[.] # Match a period. This has to be in a []![a-zA-Z.]+) # Match one or more letter or period."""The last condition is important because some URLs from other countries havemultiple extentions to them. For example, a Yahoo email address from Japanwould end in: @yahoo.co.jpThe rest is straight forward; we compile the regex, ask it for matches,then store said matches after some string working.Try poking around at the regex itself to see what makes the scriptexplode and what doesn't. Then, try making one that parses URLs. Maybe to runacross an HTML file? Your life as a professional scraper begins today!
Monday, August 15, 2011
Algorithm-a-Day : Day 14 : Heap Sort
If you already have a Heap, this is the easiest sorting algorithm to code. Ever.
Github :: Heap Sort
Github :: Heap Sort
ABOUT HEAPSORT--------------Heap sort is one of the most conceptually simple sorts out there - that is,if you already have a Heap. Simply grab the first (the maximum value!)element, store it, perform head-deletion, repeat.This process is demonstrated most elegantly in the Haskell version.For the C and Python versions, however, instead of using the headremoval function out-right, our implementations are going to go about thingsjust a little differently.C-The C and Python version share a quality: they trick the Heapinto thinking it's shorter than it is, then hiding the gradually sortedvalues at the end of the Heap instead. In doing this, we are able toperform many cascade-downs without disturbing those sorted valuesthat we accumulate at the end.For the sake of continuity, this version returns us the Heap'sinner array wrapped in an IntList. When doing a HeapSort in place,the original Heap is essentially ruined, so this shouldn't be a problem.PYTHON------Nice and short, and essentially identical to the C version.It's still running about 3 times slower than a CombSort11, whichis a bit pathetic. Unlike the C version (as has been the case withall of our data structures so far) Python can handle any type, not justints.HASKELL-------At the moment, heapRemove is so inefficient that heapSort has troublesorting a list of only a few hundred elements. It's not the sort's fault,the underlying algorithm just sucks.In the Heap Sort algorithm itself, however, we see Haskell'spowerful list creation functionality hard at work.
Sunday, August 14, 2011
Algorithm-a-Day : Day 13 : (Binary Max) Heap
Not the memory Heap! This is a tree based structure that a lot of fancy
things are built with :).
Github :: Heap
things are built with :).
Github :: Heap
ABOUT HEAPS-----------Heaps are essentially trees that satisfy something called "The HeapProperty". This property states that for every parent node,its two (or more) children will be less than it (by whatever methodof comparison you're using). The order of the children doesn't matter.Nor does is matter if a parent contents children greater than thatparent's sibling node.Heaps bless us with an interesting property: the children are alwaysan _inductable_ distance away from their parent. The root, being the firstparent, will always have children at Nodes 2 and 3. Likewise the 4th Nodewill belong to root's first child, etc.This pattern allows us to bypass implementing a tree completely,and instead store the Heap as a list/array, simply inducing the indexesof the child/parent nodes whenever we need them.INDEX ARITHMATIC----------------For a given node at index i:- children @ a[2i+1] and a[2i+2]- parent @ a[floor((i−1)/2)]WHAT HEAPS DO-------------Heaps need to be able to do a few things:- Insert and rebalance.- Delete the root and rebalance.- Yield the max (or min) value present.- Merge with another heap.INSERTION---------New values are inserted at the end of the Heap. This is trivial if theHeap is implemented as an array internally. We must then 'cascade-up',swapping the new child with its parent until the Heap Property is restored.At most this would only ever take as many operations as there arelayers in the Heap.ROOT DELETION-------------When removing the root, take the last leaf and make it the root.Then 'cascade-down', always swapping the current leaf with its greater childuntil again, the Heap Property is restored.YIELDING THE MAX/MIN--------------------While this isn't the case for all types of Heaps, Binary Max/Min heapscan complete this in a single operation.MERGING HEAPS-------------Simply yield one to the other, inserting each value and cascading asnecessary.WHAT HEAPS ARE USED FOR-----------------------Heaps are used in a particular sorting algorithm called Heap Sort,as well in graphing algorithms like Dijkstra's Algorithm.IMPLEMENTATIONS---------------C-As an experiment, I wrote this version in the Linux Kernel style ofC coding. Turned out pretty neat, if you don't mind 8-space tabs.This does everything we promised Heaps have to do: insertion, deletion,finding the max, and merging. It also prints itself out _like a tree_,just like Binary Search Tree from before.I went all-out with error checking too, and just hope I did itsensibly.PYTHON------Less than half the length of the C version... who'da thought? Python givesus a lot of power here. Merging heaps is as simple as inserting allthe elements of a given iterator. It doesn't even have to be a heap ^^!Again, lots of power with __iter__ and __str__.You may notice that cascade_down() is quite different between the Cand Python versions. The Python version uses a more general approach todetermine the greater child (and its index) and actually allows us tohave more than 2 children in a future implementation. The C version doesn'teven begin to allow this.HASKELL-------As lists in Haskell are Linked Lists and not random access arrays,implementing a Heap with a list would be a bit foolish. Instead,we go ahead and create our own data type, much like we did withthe Binary Tree. Essentially by changing how new items are added did wemake this a Heap. Notice that this Heap is foldable just like the BS Tree was,and that this Heap will print out properly in a Tree shape, just like the Cand Python versions.I admit that heapInsert and cascade are so close to each other that it'sa bit disturbing. I'm sure the solution is obvious, and that it just hasn'tdawned on me yet.Removing the root in this version is also less complicated (and slower).We simply ignore the root and heapMerge the two branches together.listToHeap isn't really necessary, but it helps out with testing.
Friday, August 12, 2011
Algorithm-a-Day : Day 12 : Binary Search Tree
Our first tree structure. Hurray!
Trees are important because a lot of things are implemented with them. Not of all of them act like BS Trees, but it's easy so we're starting with it.
Also, make sure to check out the section below on how Trees are printed. This causes a lot of people trouble, and I found a good solution.
Github :: Binary Search Tree
Trees are important because a lot of things are implemented with them. Not of all of them act like BS Trees, but it's easy so we're starting with it.
Also, make sure to check out the section below on how Trees are printed. This causes a lot of people trouble, and I found a good solution.
Github :: Binary Search Tree
ABOUT BINARY SEARCH TREE------------------------A handy tree structure built for fast look-ups!Our versions here today aren't self-balancing, meaning that if your rootsucks you could end up with a very lopsided tree.Many built-in data structures in modern languages are implemented with trees,so it's importaant to know what they are and how they work.This included knowing how the following things are done:-- Recursion.-- Queues.-- Depth-first searches.-- Breadth-first searches.(But for a binary search tree we don't have to worry about those typesof searches)ABOUT PRINTING TREE STRUCTURES------------------------------This causes so much unnecessary suffering, so I created a slick recursiveway to print out tree structures opening out to the right. Each leaf getsits own line, and everything is indented properly. It's quite easy to read.Check out both the py_test.py file and the C unit test to see whatI'm talking about.C-Here we had to introduce some new structs. Allocating memory for them isstandard run-of-the-mill stuff. Don't forget to check for NULL pointerseverywhere possible. I'll be there with an "I told you so" when a Segfaultcracks your computer in half.Insertion and checking for containment is all done with recursion. Thesethings can be done iteratively too, but it can get silly.Printing here, unlike the Python and Haskell versions, is done withoutany string creation, and instead prints right in the functions itself.There is a **unit test** file, so make sure to check that out too.PYTHON------Shorter and sweeter than the C version, but all around the same.Just as we've had in other Python versions of algorithms thus far, we getto specify 'special functions' that allow us to use keywords on our Tree.print(tree)if 5 in tree: ...etc.For printing, here we're gradually building up a list of strings of theleaves, then concatinating it all at the end in the __str__ function.Remember that in Python, str.join() is always better than str + str.HASKELL-------God functional programming rocks. Here we start by defining instancesof typeclasses for the Tree type, allowing us to (via Functor) map functionsacross our Tree, and (via Foldable) use folds across it as well.While it may look a little different, the function showTree producesthe same result as the print functions in the C and Python versions. Onceagain, a victory for recursion. showTree is called by 'show', as is standard.Everything else is essentially the same, but with the Haskell twist.Even so, its over half as short as the C version (and that's with all ourinstance definitions included!).In conclusion, Haskell continues to impress me with its power ofexpressiveness.
Thursday, August 11, 2011
Algorithm-a-Day : Day 11 : Circular Binary Search
As far as I know, I invented this. It's a binary search algorithm that works on lists that are sorted, but not linearly. For example, this algorithm can determine containment in the following list:
[4,5,6,7,8,9,0,1,2,3]
without any restructuring of the list. This is handy when searching large batches of data like log files, where sorting beforehand isn't an option.
Github :: Circular Binary Search
Check the notes in the repo, it's a good read.
[4,5,6,7,8,9,0,1,2,3]
without any restructuring of the list. This is handy when searching large batches of data like log files, where sorting beforehand isn't an option.
Github :: Circular Binary Search
Check the notes in the repo, it's a good read.
Wednesday, August 10, 2011
Algorithm-a-Day : Day 10 : Binary Search
Jon Bently goes on for quite a while about this in Programming Pearls.
Implementations can be buggy if you're not used to it, but forging one bug-free from scratch is something you can be proud of.
Github :: Binary Search
Implementations can be buggy if you're not used to it, but forging one bug-free from scratch is something you can be proud of.
Github :: Binary Search
ABOUT BINARY SEARCH-------------------You have a phone book, and are particularly violent about looking fornumbers. You best friend, Mr. Ericson, has a number that you've forgotten.You open the phone book up right into the middle: McLean. Ericson comesearlier in the phone book than McLean, so you tear it in half, toss theunneeded portion over your shoulder and repeat the process. Firth? Garbage!Carlson? Garbage! You keep searching the middle name and tearing offthe unneeded half until your find Mr. Ericson's number. If it's in there,hurray. If it isn't... well you've just wasted a phone book.Welcome to binary search!Binary search is a very fast way to search for elements in a sorted list.If the list is even one element off of being perfectly sorted, you cannotguarantee the results of the search. Given a big enough list, however,it can be a bother checking for sortedness before every search. There areways around this, but for us, we'll just assume the lists we're givenare sorted.C-Since the boolean data type doesn't exist in C, a failure to find thedesired item with result in a -1. Usually we use 0 to represent False,but in this case 0 is a legitimate array index.Target less than mid? Go left. Target greater than mid? Go right.Found it? Save and break. Using 'break' makes some people uncomfortable,but I find it clearer than say, altering 'lower' or 'upper' manuallyso that the loop will break on the next pass.Make sure to run that **unit test**!PYTHON------Sexier to look at than the C version. The only Python bonus we gain hereis (less lines and) the ability to infer the length of the list we're given,instead of it being passed explicitely as in the C version.Watch out for line 7, that's Python3 only.HASKELL-------Nice and short. Notice a theme here? Guards and 'where' are our friends.Since Haskell doesn't have loop structures, this is of course done viarecursion. It's safe to say that this will never blow the stack,as with a default stack depth limit of 1000, that allows us 1000 (give ortake a few) layers, or 2 ^ 999 values. That number is over 3.5 lines longon my 80 column terminal window. So no overflows :)'take' and 'drop' also make things easy for tearing our proverbialphone book in half. Overall, just really short and clean!
Tuesday, August 9, 2011
Algorithm-a-Day : Day 09 : Quick Sort
TL;DR -> Do this over and over: Pick pivot, partition, sort.
C-The C version is quite standard, implementing an in-place quicksortwhich is not only fast, but also saves on memory.While pivot selection is apparently a hot topic in some circles,this simple version doesn't bother with any fancy work and simplytakes the middle element (of each subset) as the pivot. One could be evenlazier and skip this entirely, taking the first element as the pivot, butif your list is sorted perfectly backwards and sufficiently long enough,you could blow the stack.-- I've included a short unit test for the C version as well, so feelfree to compile that in your spare time.PYTHON------Nearly identical to the C version, and a lot shorter.This implementation does its best not to inefficiently throw memory around,and (in theory) is an in-place sort.As a bonus from using Python, we can specify default values in qsort()'sfunction definition. Take a look! This allows us to simply pass somethinglike: qsort([4,5,2]) and everything will be taken care of.(Recall the C version where we had to pass the starting lower and upperbound values right off the bat)We also chose a different method of pivot selection. Here, we tookthree values, the lower, upper, and middle, sorted them, then chose thepivot position to be element number corresponding to the middle value(essentially the average). This ensures that the pivot will never be anoutlier, needlessly extending the stack and the amount of work that needsto be done. A combination of several built-in functions takes care ofthis work nicely for us, as can be seen in the get_pivot_pos() function.HASKELL-------Ahhh filters and lazy evaluation. This solution is obviously the cleanestout of the three. The pivot here is the same as the C version, simplychosen as the middle element of the set. This can easily be made a one-linerif you don't much care for names and neat-looking code.
Monday, August 8, 2011
Algorith-a-Day : Day 08 : Primes
Yeah! Prime numbers! Who doesn't get excited about these? And they pop up all the time in Project Euler, too.
Written your own libraries, but don't think they're too hot? Can't figure them out in the first place? Check out implementations in C, Python, and Haskell! (The Haskell version is particularly fun. See below.)
Github :: Primes
(from the notes found in the repo)
Written your own libraries, but don't think they're too hot? Can't figure them out in the first place? Check out implementations in C, Python, and Haskell! (The Haskell version is particularly fun. See below.)
Github :: Primes
(from the notes found in the repo)
ABOUT PRIMES------------A prime number is an integer that is only divisible by 1 and itself.Thus, the lowest prime number is 2. Don't be fooled, 1 is not included.Prime numbers pop up in all sorts of places around the universe,so it's handy to know about them and also have libraries to work with them.In all three versions I have included 3 things:- A function that tests for primality.- A function that given a number finds the nearest prime greater than it.- An infinite data structure that uses the above two to produce allthe prime numbers ever.Note: When testing for primality, one need only to check up to the squareroot of the number you're testing. We will see why this is with an example:Concider the number 101. It's square root is quite nearly 10.We have so far tested up to 10, but no integer has divided evenly into 101yet. For a number greater than ten to divide cleanly into 101, its 'partner'(e.g. in 15 = 3 * 5, the partner of 5 is 3) would have to be lowerthan 10. Yet since we've already checked up to ten, we know that no suchpartners exist. Therefore, 101 is prime.C-Standard procedural programming. We utilize the intlist library fromday 5 to implement the prime number list-producer.PYTHON------Here we differ from the C version only slightly, getting a trueiterator class for the prime number generator, and the helper function_next_prime() is done in the functional style. Not that the functionhead() is a function I wrote to mimic the behavior of the function of samename that appears in functional languages.HASKELL-------Here I present a unique solution of mine. The usual solution is to divideonly by odd numbers up to the root of the number you're checking, but thisprocess can be sped up by only dividing by other prime numbers. Rememberprime factors? All integers (save 1) are created by multiplying a set ofprime numbers together. A prime's only prime factor should thus be itself,and no other primes. Therefore dividing by odd numbers (while quick) is stilldoing more work than needs to be done.This Haskell solution works by having the primality test (isPrime) and theinfinite list of primes feed off each other. isPrime only divides by primesfound in the 'primes' list. But 'primes' gets its values by filtering'isPrime' across all the odd integers... This might appear too circularto work, but it turns out that supplying the original list with the values2 and 3 is enough to produce all the primes, or check the primility ofany number.
Sunday, August 7, 2011
Algorithm-a-Day : Day 07 : Linked List
Yup, a linked list!
Implementing these in Python and Haskell is nearly pointless, but here they are anyway.
Github :: Linked List
We can be grateful for 'special functions' in Python (ones that start and end with two underscores) as they allow us to easily apply built-ins to our data structures.
For example, asking for: len(our_list) will yields its length, so far as we've defined it in the class definition. The keyword 'in' will also work, as will print()!
Implementing these in Python and Haskell is nearly pointless, but here they are anyway.
Github :: Linked List
We can be grateful for 'special functions' in Python (ones that start and end with two underscores) as they allow us to easily apply built-ins to our data structures.
For example, asking for: len(our_list) will yields its length, so far as we've defined it in the class definition. The keyword 'in' will also work, as will print()!
Saturday, August 6, 2011
Algorithm-a-Day : Day 06 : FizzBuzz
The idea is to print the numbers 1 to 100, except for multiples of 3 your print 'Fizz', for multiples of 5 you print 'Buzz', and for multiples of both you print 'FizzBuzz'.
Apparently this is hard? Or it separates the men from the boys?
Regardless, I wrote four versions. The C version is standard, the Python version uses funky string manipulation, and the Haskell version (the second one) is a neat little solution I thought up myself. Take a good look at it.
Github :: FizzBuzz
Apparently this is hard? Or it separates the men from the boys?
Regardless, I wrote four versions. The C version is standard, the Python version uses funky string manipulation, and the Haskell version (the second one) is a neat little solution I thought up myself. Take a good look at it.
Github :: FizzBuzz
Friday, August 5, 2011
Algorithm-a-Day : Day 05 : itol (int-to-list)
itol was originally a function I wrote in Python to help me solve Project Euler problems. It's been through many revisions, and the Python version presented here is its simplest version. The Haskell solution is also typical with no surprises, but the C solution ended up being a bit longer, to (again) account for C's lack of a more powerful built-in list type. It essentially became a general int list library.
Github :: itol (int-to-list)
There is also a short unit test for the C version, which you can compile at your leisure.
Github :: itol (int-to-list)
There is also a short unit test for the C version, which you can compile at your leisure.
Thursday, August 4, 2011
Algorithm-a-Day : Day 04 : Command-line arguments
Needless to say, these come up a lot. How could they not?
Fledgling programmers need to know how these are handled, and I remember a day when I didn't have a flying clue how do deal with them. I mean heck, who would ever use command-line args? Who even uses the command-line? (said he, who at the time who had not even yet compiled a java program out of an IDE)
I was ignorant, to say the least.
Github :: Command-line arguments
The C and Haskell implementations are straight-forward. They're more of an "Oh, okay" rather than any sort of programming revelation. My Python implementation, however, is a big more stylised for real, well-defined use. It should be noted that it is also available in my python-libs repo on Github.
Fledgling programmers need to know how these are handled, and I remember a day when I didn't have a flying clue how do deal with them. I mean heck, who would ever use command-line args? Who even uses the command-line? (said he, who at the time who had not even yet compiled a java program out of an IDE)
I was ignorant, to say the least.
Github :: Command-line arguments
The C and Haskell implementations are straight-forward. They're more of an "Oh, okay" rather than any sort of programming revelation. My Python implementation, however, is a big more stylised for real, well-defined use. It should be noted that it is also available in my python-libs repo on Github.
Wednesday, August 3, 2011
Algorithm-a-Day : Day 03 : cat
Everyone's favourite shell command. Goes well with pipes. And sugar.
Github :: cat
All three are fairly simple, with the C implementation probably being the most so (which is to be expected). The Haskell version still makes me giggle, though.
Github :: cat
All three are fairly simple, with the C implementation probably being the most so (which is to be expected). The Haskell version still makes me giggle, though.
Tuesday, August 2, 2011
Algorithm-a-Day : Day 02 : Line Count
Counting the number of lines in a given file, in C, Python, and Haskell.
Quite simple today, but I needed some easy stuff right off the start to balance the more interesting algorithms to come later.
Github :: Line counting
Quite simple today, but I needed some easy stuff right off the start to balance the more interesting algorithms to come later.
Github :: Line counting
Monday, August 1, 2011
Algorithm-a-Day : Day 01 : Reversing words in a string.
This seems to come up as an interview question more often than not. Here is the solution in C, Python, and Haskell.
Github :: Reversing words in a string
While the Python solution is pretty trivial, Haskell wins out of the three.
Nothing beats a one-liner built from library functions.
I think there's sense in "Don't reinvent the wheel" except when you're reinventing it as a nice exercise, as I did with the C solution.
EDIT (8/11/2011) : Fixed link, and bug in code.
Github :: Reversing words in a string
While the Python solution is pretty trivial, Haskell wins out of the three.
Nothing beats a one-liner built from library functions.
I think there's sense in "Don't reinvent the wheel" except when you're reinventing it as a nice exercise, as I did with the C solution.
EDIT (8/11/2011) : Fixed link, and bug in code.
Subscribe to:
Posts (Atom)