Software Development

Posted by R. Tyler Ballance

Writing software is an outlet for artistic expression to many people, myself included. For me, solving problems involves a good deal of creativity not only in the actual solution but also in the manipulating several moving parts in order to fit the solution into an existing code-base. Combining this creative outlet with a beautiful language, such as Python results in some developers writing code that holds an masterpiece-level of beauty to them, to the untrained eye one might look at a class and think nothing of it, but to the author of that code, it might represent a substantial amount of work and personal investment.

Like art, sometimes the beauty is entirely subjective. there has been times where I've been immensely pleased with one of my creations, only to turn to wholly unimpressed Dave. Managing or working with any team of highly motivated, passionate and creative developers presents this problem, as a group: how can you objectively judge code while preserving the sense of ownership by the author?

Posted by R. Tyler Ballance

Dealing with statics in Python is something that has bitten me enough times that I have become quite pedantic about them when I see them. I'm sure you're thinking "But Dr. Tyler, Python is a dynamic language!", it is indeed, but that does not mean there aren't static variables.

The funny thing about static variables in Python, in my opinion, once you understand a bit about scoping and what you're dealing with, it makes far more sense. Let's take this static class variable for example:

  1. >>> class Foo(object):
  2. ... my_list = []
  3. ...
  4. >>> f = Foo()
  5. >>> b = Foo()

You're trying to be clever, defining your class variables with their default variables outside of your __init__ function, understandable, unless you ever intend on mutating that variable.

  1. >>> f.my_list.append('O HAI')
  2. >>> print b.my_list
  3. ['O HAI']
  4. >>>

Still feeling clever? If that's what you wanted, I bet you do, but if you wanted each class to have its own internal list you've inadvertantly introduced a bug where any and every time something mutates my_list, it will change for every single instance of Foo. The reason that this occurs is because my_list is tied to the class object Foo and not the instance of the Foo object (f or b). In effect f.__class__.my_list and b.__class__.my_list are the same object, in fact, the __class__ objects of both those instances is the same as well.

  1. >>> id(f.__class__)
  2. 7680112
  3. >>> id(b.__class__)
  4. 7680112


When using default/optional parameters for methods you can also run afoul of statics in Python, for example:

  1. >>> def somefunc(data=[]):
  2. ... data.append(1)
  3. ... print ('data', data)
  4. ...
  5. >>> somefunc()
  6. ('data', [1])
  7. >>> somefunc()
  8. ('data', [1, 1])
  9. >>> somefunc()
  10. ('data', [1, 1, 1])
  11. >>>

This comes down to a scoping issue as well, functions and methods in Python are first-class objects. In this case, you're adding the variable data to the somefunc.func_defaults tuple, which is being mutated when the function is being called. Bad programmer!

It all seems simple enough, but I still consistently see these mistakes in plenty of different Python projects (both pony-affiliated, and not). When these bugs strike they're difficult to spot, frustrating to deal with ("who the hell is changing my variable!") and most importantly, easily prevented with a little understanding of how Python scoping works.

PYRAGE!

Posted by R. Tyler Ballance

In my spurious free time I maintain a few Python modules (py-yajl, Cheetah, PyECC) and am semi-involved in a couple others (Django, Eventlet), only one of which properly supports Python 3. For the uninitiated, Python 3 is a backwards incompatible progression of the Python language and CPython implementation thereof, it's represented significant challenges for the Python community insofar that supporting Python 2.xx, which is in wide deployment, and Python 3.xx simultaneously is difficult.

As it stands now my primary development environment is Python 2.6 on Linux/amd64, which means I get to take advantage of some of the nice things that were added to Python 3 and then back-ported to Python 2.6/2.7. Regular readers know about my undying love for Hudson, a Java-based continuous integration server, which I use to test and build all of the Python projects that I work on. While working this weekend I noticed that one of my C-based projects (py-yajl) was failing to link properly on Python 2.4 and 2.5. It might be easy to cut-off support for Python 2.4, which was first released over four years ago, there are still a number of heavy users of 2.4 (such as Slide), in fact it's still the default /usr/bin/python on Red Hat Enterprise Linux 5. What makes this C-based module special, is that thanks to Travis, it runs properly on Python 3.1 as well. Since the Python C-API has been fairly stable through the 2 series into Python 3, maintaining a C-based module that supports multiple versions of Python.

In this case, it's as easy as some simple pre-processor definitions:

  1. #if PY_MAJOR_VERSION >= 3
  2. #define IS_PYTHON3
  3. #endif

Which I can use further down the line to modify the handling some of the minor internal changes for Python 3:

  1. #ifdef IS_PYTHON3
  2. result = _internal_decode((_YajlDecoder *)decoder, PyBytes_AsString(bufferstring),
  3. PyBytes_Size(bufferstring));
  4. Py_XDECREF(bufferstring);
  5. #else
  6. result = _internal_decode((_YajlDecoder *)decoder, PyString_AsString(buffer),
  7. PyString_Size(buffer));
  8. #endif

Not particularly pretty but it gets the job done, supporting all major versions of Python.

Python on Python

Writing modules in C is fun, can give you pretty good performance, but is not something you would want to do with a large package like Django (for example). Python is the language we all know and love to work with, a much more pleasant language to work with than C. If you build packages in pure Python, those packages have a much better chance running on top of IronPython or Jython, and the entire Python ecosystem is better for it.

A few weeks ago when I started to look deeper into the possibility of Cheetah support for Python 3, I found a process riddled with faults. First a disclaimer, Cheetah is almost ten years old; it's one of the oldest Python projects I can think of that's still chugging along. This translates into some very old looking code, most people who are new to the language aren't familiar with some of the ways the language has changed in the past five years, let alone ten.

The current means of supporting Python 3 with pure Python packages is as follows:

  1. Refactor the code enough such that 2to3 can process it
  2. Run 2to3 over the codebase, with the -w option to literally write the changes to the files
  3. Test your code on Python 3 (if it fails, go back to step 1)
  4. Create a source tarball, post to PyPI, continue developing in Python 2.xx

I'm hoping you spotted the same problem with this model that I did, due to the reliance on 2to3 you are now trapped into always developing Python targeting Python 2. This model will never succeed in moving people to Python 3, regardless of what amazing improvements it contains (such as the Unladen Swallow work) because you cannot develop on a day-to-day basis with Python 3, it's a magic conversion tool away.

Unlike with a C module for Python, I cannot #ifdef certain segments of code in and out, which forces me to constantly use 2to3 or fork my code and maintain two separate branches of my project, duplicating the work for every change. With Python 2 sticking around on the scene for years to come (I don;t believe 2.7 will be the last release) I cannot imagine either of these workflows making sense long term.

At a fundamental level, supporting Python 3 does not make sense for anybody developing modules, particularly open source ones. Despite Python 3 being "the future", it is currently impossible to develop using Python 3, maintaining support for Python 2, which all of us have to do. With enterprise operating systems like Red Hat or SuSE only now starting to get on board with Python 2.5 and Python 2.6, you can be certain that we're more than five years away from seeing Python 3 installed by default on any production machines.

Posted by R. Tyler Ballance

My New Year's resolution this year was incredibly generic insofar that I merely wanted to "write more." No qualifications for what kind of writing that entailed, I simply want to become a better writer (or blogger), with technical subjects in particular I'd like to get better at writing in a fashion that is interesting, parse-able by novices and has sufficient "depth" to interest more technical readers. I'm not sure if I can define what being a "better writer" will entail or how I'll know when I'm there, so for now I'm just trying to write good content. Considering my last post didn't even pretend to ride the fence between opinionated-article and full-on rant, I think it's safe to say that in order to accomplish my goal I need more venues for writing and more topics to write about.

One of those venues, which I've linked to before is the Apture Blog; I have written for the company blog already this year and chances are I will have another few posts go up as we tackle some of the technical challenges we're currently facing (you can view my posts here). Unfortunately there's only so many articles I can write for the Apture Blog without giving away any confidential information or turning it completely into a technical blog (hint: it's not).

Looking around at a few of the open source communities that I'm involved in, two groups stick out: Eventlet and Hudson. Eventlet already has a blog and I'm certain my usage of Eventlet is not steady enough to warrant any kind of authoritative posts on the subject. The other, Hudson, is something I've used on a daily basis for almost a year and a half. Not only that, I run the @hudsonci twitter account and founded the #Hudson channel on Freenode, I've also tried my hand at developing some plugins for Hudson (which is written in Java). Suffice to say, I'm quite the little Hudson cheerleader.

When I floated the idea of an "official" blog for Hudson, which I would help drive, to Kohsuke and some other "core" developers of Hudson, the idea was well received and I set off getting Drupal configured, writing some preliminary content and getting ready for a launch of Continuous Blog. While my writing contributions thus far to Continuous Blog have been sparse, I've gotten to play the delightful role of Editor which is an entirely different experience unto itself.

I'm looking forward to seeing how this develops, I might end up writing for a few other blogs depending on interest and time, but for now my shenanigans can be found on:

Mourning Sun

30 Jan 2010
Posted by R. Tyler Ballance

Some users of Hudson have already started to notice a subtle addition to the latest release, 1.343, a new background watermark image.

The commit message (r26728) from Kohsuke, the incredibly talented founder and maintainer of the Hudson project, adds a bit of sadness to the whole affair:

In tribute to Sun Microsystems and all my colleagues who had to go today. I hope the community would forgive me for doing this.

Given the incredible speed at which the tech industry grows and moves, it's easy to forget that there are a number of talented engineers that have spent their careers at Sun building technologies that have helped change the face of modern computing, regardless of whether or not Sun could figure out how to sell them: SunOS/Solaris, Java, DTrace, SPARC 64-bit chips, Sun Grid Engine, JRuby, the W3C XML specification, ZFS, OpenOffice (acquisition), MySQL (acquisition), and VirtualBox (acquisition).

As a corporation, I personally think Sun was a failure, as a foundation of engineering in Silicon Valley, I think Sun has been quite successful.

To those that are being pushed out as part of the merger with Oracle, I want to sincerely thank you for your contributions to computing and wish you the best of luck.

Posted by R. Tyler Ballance

One of my most favorite sites on the internet, reddit, took some downtime this evening while doing some infrastructure (both hardware and software) upgrades. On their down-page, the reddit team invited everybody to join the #redditdowntime channel on the Freenode network, ostensibly to help users pass the time waiting for their pics and IAMAs to come back online.

Shortly after reddit started their scheduled outage, I joined the channel to pass the time while I debated what I should do with my evening. Within minutes the channel was flooded with a number of users, varying between spouting reddit memes in caps. link-spamming or engaging in casual chit-chat. I complained to one of the ops and fairly well-known-to-redditors employee: jedberg about the lack of moderation and he nearly instantly gave me +o (ops) in the channel. Not one to take my ops duty lightly, I started kicking spammers, warning habitual caps-lock users and tried to keep things generally civil through the deluge of messages consuming the channel.

Towards the end of the scheduled outage, some automated link-spamming started to appear and once it started it triggered more and more link-spamming. Clearly whatever was behind the bit.ly link was responsible for the self-propagating nature of the spamming. While the other moderators and myself tried to keep up with banning people I used wget to fetch the destination of the clearly malicious bit.ly URL to determine what we were dealing with.