rtyler

Git back into Subversion, Mostly Automagically (Part 3/3)

Thus far I've covered most of the issues and hurdles we've addressed while experimenting with Git at Slide in parts 1 and 2 of this series, the one thing I've not covered that is very important to address is how to work in the "hybrid" environment we currently have at Slide, where as one team works with Git and the rest of the company works in Subversion. Our setup involves having a "Git-to-Subverison" proxy repository such that everything to the "left" of that proxy repository is entirely Subversion without exceptions and everything to the "right" of that repository is entirely Git, without exceptions. Part of my original motivation for putting this post at the end of the series was that, when I originally wrote the first post on "Experimenting with Git at Slide" I actually didn't have this part of the process figured out. That is to say, I was bringing code back and forth between Git and Subversion courtesy of git-svn(1) and some gnarly manual processes.

No Habla Branching

The primary issue when bringing changesets from Git to Subversion is based in the major differences between how the two handle branching and changesets to begin with. In theory, projects like Tailor were created to help solve this issue by first translating both the source and destination repositories into an intermediary changeset format in order to cross-apply changes from one end to the other. Unfortunately after I spent a couple days battling with Tailor, I couldn't get it to properly handle some of the revisions in Slide's three year history.

If you've ever used git-svn(1) you might be familiar with the git-svn dcommit command, which will work for some percentage of users that want to maintain dual repositories between Git and Subversion, things break down however once you introduce branching into the mix.
Up until Subversion 1.5, Subversion had no concept of "merge tracking" (even in 1.5, it requires the server and client to be 1.5, it also makes nasty use of svn props). Without the general support for "merge tracking" the concept of a changeset sourcing from a particular branch or the concept of a "merge commit" are entirely foreign in the land of Subversion. In less mumbo jumbo, this effectively means that the "revisions" that you would want to bring from Git into Subversion need to be "flattened" when being "dcommitted" into Subversion's trunk. Git supports a means of flattening revision history when merging and pulling by way of the "--squash" command line argument, so this flattening for git-svn is possible.

Giant Disclaimer

What I'm about to write I dutifully accept as Git-heresy, a nasty hack and not something I'm proud of.

Flattening into Subversion

First the icky bash script that supports properly flattening revisions into the "master" branch in the git-svn repository and dcommits the results:
#!/bin/bash

MERGE_BRANCH=mergemaster
REPO=$1
BRANCH=$2

if [[ -z "${1}" || -z "${2}" ]]; then
echo "===> You must provide a \"remote\" and a \"refspec\" for Git to use!"
echo "===> Exiting :("
exit 1;
fi

LATEST_COMMIT=`git log --max-count=1 --no-merges --pretty=format:"%H"`

function master
{
echo "==> Making sure we're on 'master'"
git checkout master
}

function setup_mergemaster
{
master
echo "==> Killing the old mergemaster branch"
git branch -D $MERGE_BRANCH

echo "==> Creating a new mergemaster branch"
git checkout -b $MERGE_BRANCH
git checkout master
}

function cleanup
{
rm -f .git/SVNPULL_MSG
}

function prepare_message
{
master

echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}

echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master

cp .git/SQUASH_MSG .git/SVNPULL_MSG

master
}

function merge_to_svn
{
git reset --hard ${LATEST_COMMIT}
master
setup_mergemaster

echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}

echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master

echo "==> Committing..."
git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase

cleanup
}

setup_mergemaster

prepare_message

merge_to_svn

master

echo "===> All done!"
Gross isn't it? There were some interesting things I learned when experimenting with this script, but first I'll explain how the script is used. As I mentioned above there is the "proxy repository", this script operates on the git-svn driven proxy repository, meaning this script is only invoked when code needs to be propogated from Git-to-Subversion as opposed to Subversion-to-Git which git-svn properly supports by default in all cases. Since this is a proxy repository, that means all the "real" code and goings-on occur in the "primary" Subversion, and "primary" Git repositories, so the code is going along this path: Primary_SVN <-> [proxy] <-> Primary_Git
This setup means when we "pull" (or merge) from Primary_Git/master we are going to be flattening at that point in order to properly merge it into the Primary_SVN. Without further ado, here's the breakdown on the pieces of the script:
function setup_mergemaster
{
master
echo "==> Killing the old mergemaster branch"
git branch -D $MERGE_BRANCH

echo "==> Creating a new mergemaster branch"
git checkout -b $MERGE_BRANCH
git checkout master
}
What the setup_mergemaster branch is responsible for is deleting any prior branches that have been used for merging into the proxy repository and Primary_SVN. It gives us a "mergemaster" branch in the git-svn repository that is effectively at the same chronological point in time as the master branch before any merging occurs.
function prepare_message
{
master

echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}

echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master

cp .git/SQUASH_MSG .git/SVNPULL_MSG

master
}
The prepare_message function is part of the nastiest code in the entire script, in order to get an accurate "squashed commit" commit message when the changesets are pushed into Primary_SVN, we have to generate the commit message separately from the actual merging. Since this function is performing a `git pull` from "master" into "mergemaster" the changesets that are being pulled are going to be the only ones that show up (for reasons I'm about to explain).
function merge_to_svn
{
git reset --hard ${LATEST_COMMIT}
master
setup_mergemaster

echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}

echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master

echo "==> Committing..."
git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase

cleanup
}
If you noticed above in the full script block, the "LATEST_COMMIT" code, here's where it's used, it is one of the most important pieces of the entire script. Basically the LATEST_COMMIT piece of script grabs the latest non-merge-commit hash from the `git log` output and saves it for later use (here) where it's used to rollback the proxy repository to the point in time just before the last merge commit. This is done to avoid issues with git-svn(1) not understanding how to handle merge commits whatsoever. After rolling back the proxy repository, a new "mergemaster" branch is created. After the mergemaster branch is created, the actual Primary_Git changesets that differ between the proxy repository and Primary_Git are pulled into the proxy repository's master branch, and sqaushed into the mergemaster branch where they are subsequently committed with the commit message that was prepared before. The "prepare_message" part of the script becomes important at that step because the "squashed commit" message that Git generates at this point in time will effectively contain every commit that has ever been proxied across in this manner ever.

After the "merge_to_svn" function has been run the "transaction" is entirely completed and the changesets that once differed between Primary_SVN/trunk and Primary_Git/master are now normalized.

Mostly Automagically

In the near future I intend on incorporating this script into the post-receive hook on Primary_Git in such a way that will truly propogate changesets automatically from Primary_Git into Primary_SVN, but currently I'm utilizing one of my new favorite "hammers', Hudson (see: One-line Automated Testing). Currently there are two jobs set up for proxying changesets across, the first "Subversion-to-Git" simply polls Subversion for changes and executes a series of commands when changes come in: git-svn fetch && git merge git-svn && git push $Primary_Git master. This is fairly straight-forward and fits in line with what git-svn(1) is intended to do. The other job that I created is "Git-to-Subversion" which must be manually invoked by a user, but still will automatically take care of squashing commits into Primary_SVN/trunk (i.e. bash svnproxy.sh $Primary_Git master).

Wrap-up

Admittedly, this sort of setup leaves a lot to be desired. In the ideal world, Tailor would have coped with both our Git and our Subversion repositories in such a way that would have made this script nothing more than a silly idea I had on a bus. Unfortunately that wasn't case and the time budget I had for figuring out a way to force Tailor to work was about 47.5 hours less than it took me to sit down and write the script above. I'd be interested to see other solutions other organizations are utilizing to migrate from one system to the other, but at the time of this writing I can't honestly say I've heard much about people dealing with the "hybrid" scenario that we have currently at Slide.



Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide
comments powered by Disqus