One of my tasks at my daily job I had to do was migration of 4 year old svn repository into git.
At first on my daily catch up meeting I was introduced to this tutorial by Atalasian but my imaginared case was a bit complicated.
Well I technically got no branches and all history was in one single branch. Also it was divided to 4 directories and each of it was partly what I wanted to be master. Also there were separate directories for each release I wanted to be branches so I won't loose history and also could go back where I want. To describe how it looked like:
trunk part1 part2 part3 release_part1 1.0 1.1 release_part2 2.0 2.1 release_part3 3.0 3.1
And now what I really wanted it to be
master (consist complete history from part1,part2,part3) # other branches part1 part2 part3 release/1.0 release/1.1 release/2.0 release/2.1 release/3.0 release/3.1
How I did it, You ask ?
At first I tried to use tutorial mentioned above but got some failures, the result was not what I was expecting but at least I learned something. And that is great about programming you experiment you fail and learn.
I got that ideal repository structure in mind that you saw above and I didn't want to drop history since it's great to see how project evolved after those many years. Since repository would be on github that has some nice charts in insights I was excited to see this magic.
Back to the point.
After getting some knowledge I was confident enough to try with simple proof of concept and migrate part of my repository.
I started by creating empty repository and modify
.git/config file by adding something like:
[svn-remote "svn"] url=http://my.repo/svn fetch=trunk/part1:refs/remotes/origin/part1 fetch=trunk/part2:refs/remotes/origin/part2 branches=release_part1/*:refs/remotes/origin/release/* branches=release_part2/*:refs/remotes/origin/release/*
since I didn't have duplicate names in
release_part2 I was safe with wildcard branch creation and didn't have to type all 32 branches by hand.
git svn fetch to my surprise it worked great.
So I got a working plan of first step. I removed everything, backed up my
.git/config, created new repo from scratch and modified my config file to get remaining stuff from svn and finish my work.
My final imaginary
.git/config looked like this :
[svn-remote "svn"] url=http://my.repo/svn fetch=trunk/part1:refs/remotes/origin/part1 fetch=trunk/part2:refs/remotes/origin/part2 fetch=trunk/part3:refs/remotes/origin/part3 branches=release_part1/*:refs/remotes/origin/release/* branches=release_part2/*:refs/remotes/origin/release/* branches=release_part3/*:refs/remotes/origin/release/*
I was happy, but since we wanted to migrate this repository to the public place I got to get rid of sensitive data from files across all commits and also I wanted to alter my git history to match accounts on github so I could finally see those colorful charts.
I thought I would deal with history alteration as my second task.
So I found this stackoverflow answer and also great post on github about
git filter-branch --env-filter and since I got more then one contributor in that particular repository over years I needed to alter script a bit to match my needs.
Well I wanted to run this script only once and there were only eleven contributors so it wasn't pretty.
To find contributors I run simple git one liner
git shortlog -sne that shows:
commit numbers;author name;email
Here it is sample script for 2 contributors to get what I meant:
git filter-branch --env-filter ' OLD_EMAIL1="firstname.lastname@example.org" CORRECT_NAME1="Name1" CORRECT_EMAIL1="email@example.com" OLD_EMAIL2="firstname.lastname@example.org" CORRECT_NAME2="Name2" CORRECT_EMAIL2="email@example.com" if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL1" ] then export GIT_COMMITTER_NAME="$CORRECT_NAME1" export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL1" fi if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL1" ] then export GIT_AUTHOR_NAME="$CORRECT_NAME1" export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL1" fi if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL2" ] then export GIT_COMMITTER_NAME="$CORRECT_NAME2" export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL2" fi if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL2" ] then export GIT_AUTHOR_NAME="$CORRECT_NAME2" export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL2" fi ' --tag-name-filter cat -- --branches --tags
Yup I copied those ELEVEN times. It took about 1 hour to deal with 10k commits so I was patient and let it finish it's job. I even didn't make proof of concept at that point and it was all good (that was before I found mistake).
Third task that was hardest in my opinion and required some coding turned out to be very time consuming but simple because there is some fucking awesome tool
git bfg that will replace all usernames and passwords with
So I dig trough files on the repository and found lot's of usernames and passwords that I wrote to wrote those to simple file
passwords.txt and it was indeed very exhausting and my file looked like that:
username1 pass1 pass2 usernameN passN
cause I didn't wanted to use regular expressions It turned out I didn't have java8 so I need to install it on my vm that I was using to migrate this repo.
The rest was simple stupid one command that took about 10-15 seconds cause this tool is blazing fast:
bfg --replace-text passwords.txt my-repo.git
then I cleaned history by running command that pops after end of this script to cleanup git refs history and it was done so I added the remote to git and pushed my work.
At the end it turned out that I needed to run my stupid history alter script twice more because at first I made some mistakes and also it turned out that script didn't deal with branches. So what I need to do was delete history
git update-ref -d refs/original/refs/heads/master
and then push new history using
git push --force --tags origin 'refs/heads/*'
At the end migration was finished and I could deal with another tasks.
Hope that would help someone on how to migrate his repo from svn to git cause it's not so painful as I thought it would be. See you next time.