Team Development with Git (Part 2/3)

29 Sep 2008

In my last post on Git, Experimenting with Git at Slide, I discussed most of the technical hurdles that stood in our way with evaluating Git for a Subversion tree that has 90k+ revisions and over 2GB of data held within. As I've learned from any project that involves more than just myself, technology is only half the battle, the other half is the human element. One of the most difficult things to "migrate" when switching to something as critical to a developer's workflow as a VCS, is habits, good ones and bad ones.

The Bad Habits

When moving my team over to Git, I was able to identify some habits that I view as "bad" that could either be blamed on how we have used Subversion here at Slide, or the development workflow that Subversion encourages. For the sake of avoiding flamewars, I'll say it's 51% us, 49% the system.

The Occasional Committer

Chances are that if you're working on "something super important!" you fall into this bad habit. Because of the nature of trunk in Subversion, if you commit half-finished work into a team-branch or trunk itself, you could cause plenty of pain for your fellow developers. As a result, you tend to commit at the end of a long day working on something, or only after something has been completed. The 9 hours of sweat and frustration you've spent pounding away on 300 lines of code is now summed up in one commit message:

Turns out there was a race-condition here, re #52516

Now three months from now when you return to the same 300 lines of code and try to figure out what the hell led to this mess, you're left with the commit message above, and nothing more.

The Less-than-attentive Developer

I've worked on a Mac for the majority of my time at Slide, as do most of my compatriots, and sooner or later one of two things will happen:svn add some/directory/ and/or svn commit. This usually results in a second commit to come into the tree with a commit message like:

Whoops, accidentally checked in resource forks

This isn't that large of a problem, except for the implication of the second command there, svn commit will commit all outstanding changes in your working copy starting in the current working directory, and recursing through children directories. I'm probably more anal-retentive about my commits than most, but I usually do a diff before I commit to make sure I'm aware of what I'm about to commit, but I've seen plenty of developers skip this step.

The Over-Confident Merger

I've fallen into this trap numerous times when merging "old" branches back into trunk, especially with binary files that may have been changed in trunk, or in my branch (hell if I know!). One thing I can speak to anecdotally from our work at Slide, is that the probability of nonsensical conflicts rises with a branch's age. The rate of our repository progresses at about 50 commits to trunk per day (~150 commits across the board), if there is a branch cut from trunk, usually within two weeks it can become extremely difficult to merge back into trunk without constant "refreshes" or merges from trunk into the branch.

If you're not careful when folding that branch back down into trunk, you can inadvertantly revert old binary files or even text files to previous states which will usually cause other individuals in the engineering organization gripe at you and your QA department to pelt you with rocks. For bonus points, you could (as I have done before) accidentally commit conflicting files earning a gold star and a dunce hat for the day. This merging pain led me to originally write my merge-safe.py script so long ago.

The Slide Way to Git

Fortunately for us, I think the decentralized nature of Git has helped us enforce some best practices when it comes to the bad habits above. "The Occassional Committer" is all but done away with thanks to the ability to atomically commit and revert revisions at a whim and have those changes not propogated to other developers until there has been an explicit push or pull.

Unfortunately however, "The Less-than-attentive Developer" isn't solved so easily. To date I've sat next to two engineers that were new to Git, and watched them both execute the same fateful command: git add .
Not realizing their mistake, they accidentally commit a truckload of build and temporary files (.so, .swp, .pyc, etc) interspersed with their regular work that they meant to commit. Git cannot prevent a developer from shooting themselves in the foot, but it does prevent them from shooting everybody else in the foot along with it (unless they commit, and then push their changes upwards).

"The Over-confident Merger" grows more and more confident in the Git-based workflow. Since Git handles changesets atomically, it becomes trivial to merge branch histories together or cherry-pick one revision and apply to an entirely separate branch. I've not yet seen a Git conflict that wasn't a true conflict insofar that it was quite literally one line of code changing in two different ways between branch histories. As an aside, when using git-svn, be prepared for all the merging "fun" that Subversion has to offer when propogating changes between the two systems.

Basic Team Workflow

The manner in which we use Git is more like a centralized-decentralized version control system. We still have a "master" repository, which provides a central synchronization point when pushing stage servers, or when bringing code into Subversion to be pushed to live servers. For any particular project one of the developers will create a branch that will serve as the primary project branch, take the "superpoke-proj" branch as an example. That developer will push this branch to "origin" (or the master repository) such that other developers can "track" that branch and contribute code. For the purposes of this example, let's say Paul and Peter are working in "superpoke-proj", while Paul is working he will incrementally commit his work, but once he has resolved/fixed a ticket, he will perform a git push and then mark the ticket appropriately such that a QA engineer can verify the fix. If Paul and Peter are working on something that "breaks the build" but they need to collaborate on it together, Paul can perform a git pull from Peter and vice versa, and again, once they're done those changes will be pushed to origin. This model allows for developers to work in relative isolation so they're not inadvertantly stepping on each others' toes, but also close enough that they can collaborate in explicit terms, i.e. when they are ready for changes to be propogated to each other or the rest of the team.

Conclusion

Our workflow, like most things at companies under 500 employees is still a "work in progress™". I think we've found the right balance thus far for the team between freedom and process the allow for sufficient mucking around in the codebase in a way that provides the most amount of time actually writing code with as little possible time spent dealing with the overhead of anything else (merging, etc). There's nothing inherently special in the way we use Git, but we've found that it works for the way we work, which is to say in a very tight release schedule that's requires multiple branches per week and plenty of merging from branch-to-branch whether it be from another team or another part of the same team.

Of course, your mileage may vary.

Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide