Software Development

S.A.D. - Seasonal Ada Disorder

Last Sunday, I announced the "0.1" release of my memcache-ada project on comp.lang.ada, thus ending a 2 month experiment with the Ada programming language.

In my previous post on the topic, I mentioned some of the things that interested me with regards to Ada and while I didn't use all the concepts that make Ada a powerful language, I can now confidentally say that I know enough to be dangerous (not much more though).

Old school
This is what my coworkers thought of me, learning Ada.

All said and done I spent less than two months off and on creating memcache-ada, mostly on my morning and evening commutes. The exercise of beginning and ending my day with a language which tends to be incredibly strict was interesting to say the least. Due to the lack of an REPL such as Python's, I found myself writing more and more unit and integration tests to get a feel for the language and the behavior of my library.

Ada? Surely you jest Mr. Pythonman

The past couple weeks I've been spending my BART commutes learning the Ada programming language. Prior to starting to research Ada, I sat in my office frustrated with Python for my free time hackery. Don't get me wrong, I love the Python language, I have enjoyed the ease of use, dynamic model, rapid prototyping and expressiveness of the Python language, I just fall into slumps occasionally where some of Python's "quirks" utterly infuriating. Quirks such as its loosey-goosey type system (which I admittedly take advantage of often), lack of good concurrency in the language, import subsystem which has driven lesser men mad and its difficulty in scaling organically for larger projects (I've not yet seen a large Python codebase that hasn't been borderline "clusterfuck".)

Before you whip out the COBOL and Fortran jokes, I'd like to let it known up front that Ada is a modern language (as I mentioned on reddit, the first Ada specification was in 1983, 11 years after C debuted, and almost 30 years after COBOL and Fortran were designed). It was most recently updated with the "Ada 2005" revision and supports a lot of the concepts one expects from modern programming languages. For me, Ada has two strong-points that I find attractive: extra-strong typing and built-in concurrency.

Incredibly strong typing

The typing in Ada is unlike anything I've ever worked with before, coming from a C-inspired languages background. Whereas one might use the plus sign operator in Python to add an int and a float together without an issue, in Ada there's literally zero auto-casting (as far as I've learned) between types. To the inexperienced user (read: me) this might seem annoying at first, but it's fundamental to Ada's underlying philosophy of "no assumptions." If you're passing an Integer into a procedure that expects a Float, there will be no casting, the statement will error at compile time.

Concurrency built-in

Unlike C, Java, Objective-C and Python (languages I've used before), Ada has concurrency defined as part of the language, as opposed to an abstraction on top of an OS level library (pthreads). In Ada this concept is called "tasking" which allows for building easily concurrent applications. Unlike OS level bindings built on top of pthreads (for example) Ada provides built in mechanisms for communicating between "tasks" called "rendezvous" along with scheduling primitives.

Being able to define a "task" as this concurrent execution unit that uses this rendezvous feature to provide "entries" to communicate with it is something I still haven't wrapped my head around to be honest. The idea of a language where concurrency is a core component is so new to me I'm not sure how much I can do with it.

For my first "big" project with Ada, I've been tinkering with a memcached client in Ada which will give me the opportunity to learn some Ada fundamentals before I step on to bigger projects. Disregarding the condescending jeers from other programmers who one could classify as "leet Django haxxorz", I've been enjoying the experience of learning a new vastly different language than one that I've tried before.

So stop picking on me you big meanies :(

GNU/Parallel changed my life

The @Apture ElephantsOver the past month or so I've fallen in love with an incredibly simple command line tool: GNU/Parallel. Parallel has more or less replaced my use of xargs when piping data around on the many machines that I use. Unlike xargs however, Parallel lets me make use of the many cores that I have access to, either on my laptop or the many quad and octocore machines we have lying around the Apture office.

Using Parallel is incredibly easy, in fact the docs enumerate just about every possible incantation of Parallel you might want to use, but starting simple you can just pipe stuff to it:

cat listofthings.txt | parallel --max-procs=8 --group 'echo "Thing: {}"'

The command above will run at most eight concurrent processes and group the output of each of the processes when the entire thing completes, simple and in this case not too much different than running with xargs

With some simple Python scripting, Parallel becomes infinitely more useful:

python generatelist.py | parallel --max-procs=8 --group 'wget "{}" -O - | python processpage.py'

There's not really a whole lot say about GNU/Parallel other than you should use it. I find myself increasingly impatient when a single process takes longer than a couple minutes to complete, so I've been using GNU/Parallel in more and more different ways across almost all the machines that I work on to make things faster and faster. So much so that I've started to pine for a quad-core notebook instead of this weak dual core Thinkpad of mine :)

GNU/Parallel Demo

Unclog the tubes; blocking detection in Eventlet

Colleagues of mine are all very familiar with my admiration of Eventlet, a Python concurrency library, built on top of greenlet, that provides lightweight "greenthreads" that naturally yield around I/O points. For me, the biggest draw of Eventlet besides its maturity, is how well it integrates with standard Python code. Any code that uses the built-in socket module can be "monkey-patched" (i.e. modified at runtime) to use the "green" version of the socket module which allows Eventlet to turn regular ol' Python into code with asynchronous I/O.

The problem with using libraries like Eventlet, is that some Python code just blocks, meaning that code will hit an I/O point and not yield but instead block the entire process until that network operation completes.

In practical terms, imagine you have a web crawler that uses 10 "green threads", each crawling a different site. The first greenthread (GT1) will send an HTTP request to the first site, then it will yield to GT2 and so on. If each HTTP request blocks for 100ms, that means when crawling the 10 sites, you're going to block the whole process, preventing anything from running, for a whole second. Doesn't sound too terrible, but imagine you've got 1000 greenthreads, instead of everything smoothly yielding from one thread to another the process will lock up very often resulting in painful slowdowns.

Starting with Eventlet 0.9.10 "blocking detection" code has been incorporated into Eventlet to make it far easier for developers to find these portions of code that can block the entire process.

  1. import eventlet.debug
  2. eventlet.debug.hub_blocking_detection(True)

While using the blocking detection is fairly simple, its implementation is a bit "magical" in that it's not entirely obvious how it works.

Being a Libor, Addendum

A couple of weeks ago I wrote a post on how to "Be a Libor", trying to codify a few points I feel like I learned about building a successful engineering team at Slide. Shortly after the post went live, I discovered that Libor had been promoted to CTO at Slide.

Over coffee today Libor offered up some finer points on the post in our discussion about building teams. It is important, according to Libor, to maintain a "mental framework" within which the stack fits; guiding decisions with a consistent world-view or ethos about building on top of the foundation laid. This is not to say that you should solve all problems with the same hammer, but rather if the standard operating procedure is to build small single-purpose utilities, you should not attack a new problem with a giant monolithic uber-application that does thirty different things (hyperbole alert!).

Libor also had a fantastic quote from the conversation with regards to approaching new problems:

Just because there are multiple right answers, doesn't mean there's no wrong answers

Depending on the complexity of the problems you're facing there are likely a number of solutions but you still can get it wrong, particularly if you don't remain consistent with your underlying mental framework for the project/organization.

As usual my discussions with Libor are interesting and enjoyable, he's one of the most capable, thoughtful engineers I know, so I'm interested to see the how Slide Engineering progresses under his careful hand as the new CTO. I hope you join me in wishing him the best of luck in his role, moving from wrangling coroutines, to herding cats.

God speed mooncat

The slow death of the indie mac dev

Once upon a time I was a Mac developer. I loved Cocoa, I loved building Mac software, Mac OS X was once upon a time the greatest thing ever. I recall writing posts, and even founding a mailing list in the earlier days of Core Data, which I was using in tandem with Cocoa Bindings, which themselves were almost a black art. I was on a couple of podcasts talking about web services with Cocoa or MacWorld. I loved the Mac platform, and would have gladly rubbed Steve Jobs' feet and thanked him a thousand times for saving Apple from the despair of the late 1990's. As Apple grew, things slowly started to change, and we started to grow apart.

As I started to drift away, I gave a presentation at CocoaHeads presenting some of the changes and improvements to the Windows development stack, not supremely keen on the idea of building Windows applications, I was clearly on the market for "something else". Further and further I drifted, until I eventually traded my MacBook Pro in for a Thinkpad, foregoing any future I might have developing Mac software. My decade long journey of tinkering and learning on Macintosh computers had ended.

When Mac OS X was in it's original Rhapsody-phase, in the weird nether-world between Platinum and Aqua, Apple realized that it had been held back by not giving developers tools to build for the platform. Apple began to push Project Builder which became Xcode, which became the key to the Intel-transition and has helped transform Mac OS from a perennial loser in the third-party software world to a platform offering the absolute best in third-party software. Third-party applications of impressive quality were built and distributed by the "indie mac devs", Adium, Voodoo Pad and Acorn from Flying Meat, Nicecast and Audio Hijack Pro from Rogue Amoeba, FuzzMeasure Pro from SuperMegaUltraGroovy, Growl, NetNewsWire or MarsEdit originally from Brent Simmons (NetNewsWire is now owned by NewsGator, while MarsEdit was acquired by Daniel Jalkut of Red Sweater Software), Yojimbo and BBEdit from BareBones, even Firefox, Camino and Opera filled the gap while Apple pulled Safari out of it's craptastic version 2 series. Applications were used on Mac OS X instead of web applications because the experience was better, faster and integrated with Address Book, iPhoto, Mail.app, iMovie and all of Apple's own stack.

Then came the iPhone, with its "Web SDK" nonsense. The story, at least at the time, was clear to me. Apple didn't care about me. Apple didn't care about its developers. Build a web application using JavaScript and AJAX (a Microsoft innovation, I might add) over AT&T's EDGE network? Fuck you!