Migrating from cvs to git
by Sebastien Mirolo on Sun, 6 May 2012I have been involved with a project that has been using cvs for the longest time. Over the years, a great deal of experience has been acquired in source control management, processes and tools have evolved to reflect that knowledge. A lot of third-party tools like wikis, bug trackers, etc. actively maintain git plug-ins. Cvs is increasingly an afterthought, for very good reasons. So despite everything running smoothly with cvs for now, it won't be long before sticking with cvs becomes a huge technical debt.
Currently the source base is composed of projects, each project being a top-level subdirectory of a single cvs repository tree residing on a server machine.
$ ls -la /var/cvs /var/cvs CVSROOT/ xx project1/ Makefile src/ include/ project2 Makefile src/ include/ project3 Makefile src/ include/
On the major advantages of this setup for the development team is the ability to do sparse checkout (known as views in perforce terminology) with a single session.
cvs co -d project1 project3
The previous command will initiate a session with the repository server, asking for credentials once, then fetch the source tree for project1 and project3 in that session before closing down the session. In that scenario, the source tree for project2 did not get fetched since it was not necessary for the task at hand. Furthermore, the developer was asked for his credentials only once which is a big deal when that requires typing a password.
Git, opposite to CVS, has a notion of atomic commits. To achieve the same sparse checkout feature as we have been used to with the CVS repository, we will thus need to have
- A single git repository for all projects with support for partial cloning.
- One git repository per project with the possibility to clone multiple repositories in one authenticated session.
Git does not rely on a central repository database as cvs does. Each "git clone" command is physically copying the whole repository locally. "git pull" and "git push" are merely glorified merges across branches that resides in different parts of the network.
For a while I thought Git submodules could be helpful to implement a single authenticated session. That did not work, nor did gitosis.
Gitosis is a set of scripts to manage read/write access to git repositories without requiring a user account on the server. It works by creating a "git" user that will own the repository directories and files. Clients connect to the repository through the git user provided their public key is added to the sshd authorized_keys for the git user. The "command=" parameter is setup such as to run gitosis-server that will use the gitosis.conf file to grant or deny access to the repository directories.
Furthermore for easier management, gitosis uses an git revision controlled "admin" repository with a post-update hook that updates the authorized_keys as required.
I started to think about using a ssh tunnel and a daemon on the server, maybe http if necessary. That means though that the cloned repository will reference "localhost". In the end it did not seem like a workable solution in the long run. In any case for reference, you can enable ssh tunnels without shell access as follow.
$ man sshd ... AUTHORIZED_KEYS FILE FORMAT ... $ cat /home/username/.ssh/authorized_keys permitopen="example.com:80",no-pty,no-agent-forwarding,\ no-X11-forwarding,command="/bin/noshell.sh" ssh-rsa AAAAB3...
It seems that the only way to alleviate the single authenticated session is to rely on ssh keys and ssh-agent. Gitosis is quite useful on the server-side if you do not want to provide a shell access to the machine to each contributor.
$ exec /usr/bin/ssh-agent $SHELL $ ssh-add $ git clone hostname:directory
Actual Migration
I followed two excellent posts here and here.
I only had one issue with cvsps because it could not handle some of the branched, deleted, reverted files written in the cvs history. It took me quite a while before I realized that cvsps was creating a cache in my home directory. After manually editing the cvs history in the file concerned, the following commands worked like a charm to import the cvs tree as multiple git repositories.
$ rm -rf ~/.cvsps $ git cvsimport -i -k -v -d cvsroot -p x,"b HEAD" project
A few useful git command lines
Getting a source package out of a repository branch
git-tar-tree branchname git archive -b branchname tag
Getting log messages for a specific time period.
git log --before="2010-09-15" --after="2010-09-07"
Getting statics for number of added and deleted lines
git log --numstat - - theme/resources/overalldraft125.png 0 489 theme/resources/style.css ...
Visualize the commit graph history as a graph
git log --pretty=format:'%h : %s' --graph
Dealing with white spaces (see git-scm)
$ git diff --check $ git config --global core.autocrlf input $ git config --global core.whitespace \ trailing-space,space-before-tab,indent-with-non-tab # Color outputs $ git config --global color.ui true
Git graphical front-ends
There are posts all over stackoverflow that discusses graphical front-ends for git. I am mostly interested in osx but there are posts specific to linux and windows as well as cross platforms. One of the most in-depth review I could find is Mac OS X Git Clients Roundup. The following list is a convenient presentation of what I could gather.
gitg is a git repository viewer targeting gtk+/GNOME. The way branches of a file is shown (screen shots) is particularly interesting.
Worth mentioning, there exists EGit an Eclipse plug-in for git for those interested in IDE integration.
web-based repository browser
github is a well-known web-based git hosting site with a decent browsing experience. Viewgit is very slick and will run on your own host.
FishEye's interface is very clubbered and every clicks responds so slooowly that it does make it a very practical tool. Chora Repository Viewer for CVS and subversion responds well but the interface lacks imagination and you get lost very quickly through it.