Tuesday, January 29, 2013

We're on Git!

I love Git and I believe in Open Source. It's been a long dream of mine to move our codebase at RIPE NCC from Subversion to Git and also open source part of my team's work there. Finally it's happening.

Subversion => Git

We had a Subversion repository with 40k+ revisions. This was an opportunity to also clean up some junk. At least I had to make sure that:
  • existing history, tags and branches in Subversion are preserved
  • Maven release plugin works
  • Bamboo (our continuous integration server) can build plans
  • Sonar (our static code analysis tool) keeps working
There's plenty of tutorials and case studies on the internet about how to do the actual migration. This was my streamlined process:
  1. Clone the entire repository for a specific project, which means:
    1. map the Subversion user names to authors in an authors-transform.txt file
    2. clone the repository using the author mapping file
  2. Convert svn:ignore properties to .gitignore
  3. Fetch branches and tags, which means:
    1. Copy remote branches/tags as local ones
    2. Turn the Subversion tags that git svn maintains as branches into proper Git tags
  4. Clean-up unused branches, etc. (optional)
  5. Push the newly created local tags and branches to the final Git repository

1. Cloning the Subversion repository

I had a Subversion work directory for each project so it was easy to extract the usernames for that repository:
svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt
Then I had to find a name and email address for each username. This can be challenging when no one in your department remembers whom that username belonged to.

Having the authors-mapping file I was able to clone the Subversion repository and create a local Git repository:
git svn clone -s https://localhost:10443/svn/project/ --no-metadata -A authors-transform.txt project
This command assumes that the standard Subversion layout (trunk, tags, branches) is used which was the case for most of our projects.

2. Convert svn:ignore properties to .gitignore

The svn:ignore property contains a list of file patterns which certain Subversion operations will ignore. This is perhaps the most commonly used special property and it would be great to preserve them in a form that Git understands. git-svn knows what I want:
git svn show-ignore > .gitignore
git add .gitignore
git commit -m 'Convert svn:ignore properties to .gitignore.'

3. Fetch and convert branches and tags

git-svn does not seem to copy remote the branches and tags as local ones so let's encourage it a bit:
git fetch . refs/remotes/*:refs/heads/*
Having all branches as local ones we can erase the remote ones. The local trunk branch is redundant with our Git master branch:
for branch in `git branch -r`; do git branch -rd $branch; done
git branch -d trunk
At this point actually we can detach our local repository from the remote one, that is, remove the git-svn configuration options and folders:
git config --remove-section svn-remote.svn
rm -Rf .git/svn/
For some reason git svn maintains the Subversion tags as branches. Let's turn them into proper git tags:
git for-each-ref --format='%(refname)' refs/heads/tags | cut -d / -f 4 | while read ref; do git tag "$ref" "refs/heads/tags/$ref"; git branch -D "tags/$ref"; done

4. Clean-up

We may want to rename some tags by creating a new reference to them and then removing the old one:
git tag new_tag old_tag
git tag -d old_tag
This is also a good time for deleting tags or useless branches.

It's nice to remove the empty commits:
git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"' HEAD

If we have files that should have never been under version control this is the time to remove them (e.g. *.iml files):
git filter-branch --tree-filter 'git rm -r -f --ignore-unmatch *.iml' --prune-empty -f -- --all 

5. Push to the final Git repository

I set up our Git server using gitolite. That's the repository from which the team and the continuous integration server will fetch the code.

We can simply add a remote to our local repository and push there. "--all" pushes all refs under refs/heads but not the tags so we have to remember to push those as well.
git remote add origin git@our.git.server.net:project.git
git push origin --all
git push origin --tags

Final words

So now we have git. Offline operations are unbelievably fast: commands take more time to type than they do to execute. I can keep committing on the train on the way home and push at my earliest convenience. I can have cheap local branches. Fine-grained local commits that I can merge or reword before pushing. Endless possibilities.

I did this migration as a side project (mostly in my spare time) incrementally and it was all done (including the Continuous Integration server updates) in about a week without any major interruption to the team's work.
What's next? Let's open source! To be continued...