Migrating svn to git complete guide

One of my tasks at my daily job I had to do was migration of 4 year old svn repository into git.
At first on my daily catch up meeting I was introduced to this tutorial by Atalasian but my imaginared case was a bit complicated.

Well I technically got no branches and all history was in one single branch. Also it was divided to 4 directories and each of it was partly what I wanted to be master. Also there were separate directories for each release I wanted to be branches so I won't loose history and also could go back where I want. To describe how it looked like:

trunk
 part1
 part2
 part3
 release_part1
   1.0
   1.1
 release_part2
   2.0
   2.1
 release_part3
   3.0
   3.1

And now what I really wanted it to be

master (consist complete history from part1,part2,part3)
# other branches
part1
part2
part3
release/1.0
release/1.1
release/2.0
release/2.1
release/3.0
release/3.1

How I did it, You ask ?

At first I tried to use tutorial mentioned above but got some failures, the result was not what I was expecting but at least I learned something. And that is great about programming you experiment you fail and learn.

To get some more knowledge I started reading great documentation of git svn and of course found some great post and post on mighty stackoverflow.

I got that ideal repository structure in mind that you saw above and I didn't want to drop history since it's great to see how project evolved after those many years. Since repository would be on github that has some nice charts in insights I was excited to see this magic.

Back to the point.

After getting some knowledge I was confident enough to try with simple proof of concept and migrate part of my repository. I started by creating empty repository and modify .git/config file by adding something like:

[svn-remote "svn"]
url=http://my.repo/svn
fetch=trunk/part1:refs/remotes/origin/part1
fetch=trunk/part2:refs/remotes/origin/part2
branches=release_part1/*:refs/remotes/origin/release/*
branches=release_part2/*:refs/remotes/origin/release/*

since I didn't have duplicate names in release_part1 and release_part2 I was safe with wildcard branch creation and didn't have to type all 32 branches by hand. After running git svn fetch to my surprise it worked great.

So I got a working plan of first step. I removed everything, backed up my .git/config, created new repo from scratch and modified my config file to get remaining stuff from svn and finish my work.

My final imaginary .git/config looked like this :

[svn-remote "svn"]
url=http://my.repo/svn
fetch=trunk/part1:refs/remotes/origin/part1
fetch=trunk/part2:refs/remotes/origin/part2
fetch=trunk/part3:refs/remotes/origin/part3
branches=release_part1/*:refs/remotes/origin/release/*
branches=release_part2/*:refs/remotes/origin/release/*
branches=release_part3/*:refs/remotes/origin/release/*

I was happy, but since we wanted to migrate this repository to the public place I got to get rid of sensitive data from files across all commits and also I wanted to alter my git history to match accounts on github so I could finally see those colorful charts.

I thought I would deal with history alteration as my second task. So I found this stackoverflow answer and also great post on github about git filter-branch --env-filter and since I got more then one contributor in that particular repository over years I needed to alter script a bit to match my needs.

Well I wanted to run this script only once and there were only eleven contributors so it wasn't pretty. To find contributors I run simple git one liner git shortlog -sne that shows:

commit numbers;author name;email

Here it is sample script for 2 contributors to get what I meant:

git filter-branch --env-filter '
OLD_EMAIL1="old1@example.com"
CORRECT_NAME1="Name1"
CORRECT_EMAIL1="name.surname1@example.com"
OLD_EMAIL2="old2@example.com"
CORRECT_NAME2="Name2"
CORRECT_EMAIL2="name.surname2@example.com"
if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL1" ]
then
    export GIT_COMMITTER_NAME="$CORRECT_NAME1"
    export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL1"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL1" ]
then
    export GIT_AUTHOR_NAME="$CORRECT_NAME1"
    export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL1"
fi
if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL2" ]
then
    export GIT_COMMITTER_NAME="$CORRECT_NAME2"
    export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL2"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL2" ]
then
    export GIT_AUTHOR_NAME="$CORRECT_NAME2"
    export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL2"
fi
' --tag-name-filter cat -- --branches --tags

Yup I copied those ELEVEN times. It took about 1 hour to deal with 10k commits so I was patient and let it finish it's job. I even didn't make proof of concept at that point and it was all good (that was before I found mistake).

Third task that was hardest in my opinion and required some coding turned out to be very time consuming but simple because there is some fucking awesome tool git bfg that will replace all usernames and passwords with ***REMOVED*** string.

So I dig trough files on the repository and found lot's of usernames and passwords that I wrote to wrote those to simple file passwords.txt and it was indeed very exhausting and my file looked like that:

username1
pass1
pass2
usernameN
passN

cause I didn't wanted to use regular expressions It turned out I didn't have java8 so I need to install it on my vm that I was using to migrate this repo.

The rest was simple stupid one command that took about 10-15 seconds cause this tool is blazing fast:

bfg --replace-text passwords.txt  my-repo.git

then I cleaned history by running command that pops after end of this script to cleanup git refs history and it was done so I added the remote to git and pushed my work.

At the end it turned out that I needed to run my stupid history alter script twice more because at first I made some mistakes and also it turned out that script didn't deal with branches. So what I need to do was delete history

git update-ref -d refs/original/refs/heads/master

and then push new history using

git push --force --tags origin 'refs/heads/*'

At the end migration was finished and I could deal with another tasks.

Hope that would help someone on how to migrate his repo from svn to git cause it's not so painful as I thought it would be. See you next time.