Git

Git-related posts
Posted by R. Tyler Ballance

A few months ago Kohsuke, author of the Hudson continuous integration server, introduced me to the concept of the "pre-tested commit", a feature of the TeamCity build management and continuous integration system. The concept is simple, the build system stands as a roadblock between your commit entering trunk and only after the build system determines that your commit doesn't break things does it allow the commit to be introduced into version control, where other developers will sync and integrate that change into their local working copies. The reasoning and workflow put forth by TeamCity for "pre-tested commits" is very dependent on a centralized version control system, it is solving an issue Git or Mercurial users don't really run into. Those using Git can commit their hearts out all day long and it won't affect their colleagues until they merge their commits with others.

In some cases, allowing buggy or broken code to be merged in from another developer's Git repository can be worse than in a central version control system, since the recipient of the broken code might perform a knee-jerk git-revert(1) command on the merge!

Posted by R. Tyler Ballance

A while ago, when Paul, Jason and I worked together, I became a big fan of code reviews before merging code. It was no surprise really, we were the first to adopt Git at the company and our workflow was quite ad-hoc, the need to federate knowledge within the group meant code reviews were a pretty big deal. At the time, we mostly did code reviews in person by way of "hey, what's this you're doing here?" or by literally sending patch emails with git-format-patch(1) to the team mailing list so all could participate in the discussion about what merits "good code" exhibited versus "less good code." Now that I've left that company and joined another one, I've found myself in another small-team situation, where my teammates place high value on code review. Fortunately this time around better tools exist, namely: Gerrit.

The history behind Gerrit I'm a bit hazy on, what I do know is that it's primary developer Shawn Pearce (spearce) is one of the Git "inner circle" who contributes heavily to Git itself as well as JGit, a Git implementation in Java which sits underneath Gerrit's internals.

Posted by R. Tyler Ballance

Perhaps the title is a bit too much ego stroking, yes, I did write the fastest Python module for decoding JSON strings and encoding Python objects to JSON. I didn't however write the parser behind the scenes.

Over the summer I discovered "Yet Another JSON Library" on GitHub, written by Lloyd Hilaiel, jonesing for a Saturday afternoon project I started the "py-yajl" project to see if I could implement a Python C module atop Lloyd's marvelous parsing library. After tinkering with the project for a while I got a working prototype building (learning how to define custom types in Python along the way) and let the project stagnate as my weekend ended and the workweek resumed.

A little over a week ago "autodata", another GitHub user, sent me a "Pull Request" with some minor changes to make py-yajl build cleaner on amd64; my interest in the project was suddenly reignited, amazing what a little interest can do for motivation. Over the 10 days following autodata's pull request I discovered that a former colleague of mine and fellow GitHub user "teepark" had forked the project as well, working on Python 3 support. Going from zero to two people interested in the project, I quickly converted the code from a stagnant, borderline embarrassing, dump of C code into a leak-free, swift JSON library for Python.

Do you love Git too?

03 Nov 2009
Posted by R. Tyler Ballance

In addition to RSS feeds, one of my favorite sources of reading material is the Git mailing list; I'm not really active, I simply enjoy reading the discussions around code and the best solutions for certain problems. If you read the list long enough, you'll start to appreciate the time and attention the Git core developers (spearce, peff and junio (a.k.a. gitster)) put into cultivating the code and in cultivating new contributors. Of all the open source projects I watch to one extent or another, Git is very effective at bringing in new contributors and getting their contributions vetted for inclusion.

If you're a heavy Git user (like me) you can certainly see the results of their tireless efforts, Junio's (git.git's maintainer) in particular. I highly recommend checking out his Amazon wishlist to thank him for his efforts.

Tags:
Posted by R. Tyler Ballance

At the Hudson Bay Area Meetup/Hackathon that Slide, Inc. hosted last weekend, I worked on the Jython plugin and released it just days after releasing a strikingly similar plugin, the Python plugin. I felt that an explanation might be warranted as to why I would do such a thing.

For those that don't know, Hudson is a Java-based continuous integration server, one of the best CI servers developed (in my humblest of opinions). What makes Hudson so great is a very solid plugin architecture allowing developers to extend Hudson to support a wide variety of scripting languages as well as notifiers, source control systems, and so on (related post on the growth of Hudson's plugin ecosystem). Additionally, Hudson supports slaves on any operating system that Java supports, allowing you to have a central manager (the "master" Hudson server/node) and a vast network of different machines performing tasks and executing jobs. Now that you're up to speed, back to the topic at hand.

Jython versus Python plugin. Why bother with either, as @gboissinot pointed out in this tweet? The interesting thing about the Jython plugin, particularly when you use a large number of slaves is that with the installation of the Jython plugin, suddenly you have the ability to execute Python script on every single slave, regardless of whether or not they actually have Python installed. The more "third party" that can be moved into Hudson by way of the plugin system means reduced dependencies and difficulty setting up slaves to help handle load.

Take the "git" versus the "git2" plugin, the git plugin was recently criticized on the #hudson channel because of it's use of the JGit library, versus "git2" which invokes git(1) on the command line. The latter approach is flawed for a number of reasons, particularly the reliance on the git command line executables and scripts to return consistent formatting is specious at best even if you aren't relying on "porcelain" (git community terminology for front-end-ish script and code sitting on top of the "plumbing", the breakdown is detailed here). The command-line approach also means you now have to ensure every one of your slaves that are likely to be executing builds have the appropriate packages installed. One the flipside however, with the JGit-based approach, the Hudson slave agent can transfer the appropriate bytecode to the machine in question and execute that without relying on system-dependencies.

The Hudson Subversion plugin takes a similar approach, being based on SVNKit.

Being a Python developer by trade, I am certainly not in the "Java Fanboy" camp, but the efficiencies gained by incorporating Java-based libraries in Hudson plugins and extensions is a no brainer, the reduction of dependencies on the systems incorporated in your build farm will save you plenty of time in maintenance and version woes alone. In my opinion, the benefits of JGit, Jython, SVNKit, and the other Java-based libraries that are running some of the most highly used plugins in the Hudson ecosystem continue to outweigh the costs, especially as we find ourselves bringing more and more slaves online.

Posted by R. Tyler Ballance

I've been sending "Protip" emails about Git to the rest of engineering here at Slide for a while now, using the "Protips" as a means of introducing more interesting and complex features Git offers.


There are those among us who can look at a reproduction case for a bug and just know what the bug is. For the rest of us mere mortals, finding out what change or set of changes actually introduced a bug is extremely useful for figuring out why a particular bug exists. This is even more true for the more elusive bugs or the cases where code "looks" correct and you're stumped as to why the bug exists now, when it didn't yesterday/last week/last month. The options in most classical version control systems you have available to you are to sift through diffs or wade through log message after log message trying to spot the particular change that introduced the regression you're now tasked with resolving.

Fortunately (of course) Git offers a handy feature to assist you in tracking down regressions as they're introduced, git bisect. Take the following scenario:
Roger has been working on some lower level changes in a project branch lately. When he left work last night, he ran his unit tests (everything passed), committed his code and went home for the day. When he came in the next morning, per his typical routine, he synchronized his project branch with the master branch to ensure his code wasn't stomping on released changes. For some reason however, after synchronizing his branch, his unit tests started to fail indicating that a bug was introduced in one of the changes that was integrated into Roger's project branch.