Subversion directory organisation

February 26th, 2007 by Mark · 4 Comments

Using subversion (or any other version-control system) to manage your working laboratory or research files is sensible. All changes can be tracked, and it is straightforward to review old versions of files. I store all work relating to research, including notes, papers, thesis chapters, statistical analyses and even data. If I were to make catastrophic changes (deliberately or non-deliberately) it is easy to roll-back changes. It’s like a “Track Changes” on steroids.

Using subversion is quite complex, but there are easy front-ends such as TortoiseSVN that anyone can use. This article discusses some more complex version control issues, and so if you are not familiar with command-line interfaces then I would suggest reading James’ up and coming article on the simple use of TortoiseSVN using Microsoft Windows.

It is recommended that software developers store project files within a common root organisational structure.

/trunk/ /branches/ /tags/

The current development work continues within the trunk directory, which may contain multiple projects organised into multiple subdirectories. Multiple developers can “checkout” this trunk and make local changes, “committing” their changes at the end of the day. Once development is finished, then the whole lot can be copied into the “tags” directory, labelling that constellation of files and directories as a “release” – for example version 1.2.

I ignored this structure when I created my subversion repository. I naively thought that these issues didn’t apply to research work but of course they do. My research work has a huge number of interdependencies and at certain times, all these separate systems need to work together to form a complete work, such as an ethics submission, a paper for publication, or a thesis. I have multiple files that depend on one another. For example, I have just submitted an article for publication. This is a dynamically generated PDF document using LaTeX and Sweave to run dynamic statistical analysis using R. It is dependent on data (held in a CSV file), the bibliography, a bibliography style file, some radiology images, and of course my entire suite of R pre-processing functions that take raw data and spin it summary statistics, graphs etc. I will obviously have the PDF generated, and this can be archived along with a paper copy. However, in the future, I should like to be able to repeat the analysis in the exactly same way as occured at the time of publication. In any normal system, I will update bibliographies and the R code, and this may change that old dynamic document. What I need to do is store a snapshot of the working system at the time of submission. The creation of a snapshot is called “tagging”, and is very straightforward.

Creating “tags” is very easy in subversion, and does not use up excessive diskspace as it uses a well designed but simple mechanism to ensure files are only stored once. However, one needs a sensibly organised directory structure to implement them. Tags need not be checked out, and do not need to hang around in your working folders (where they will take up space).

My problem is that my working repository has working files in the root of the repository. Creating a subdirectory of “tags” would work, but whenever the project is “checked out” of the repository, all of the tags would be copied too, leading to a dramatic waste of disk space, and risking committing changes to tags that should be regarded as “unchangeable”. One must re-organise the repository into the “trunk”, “tags”, “branches” paradigm, and instead checkout the trunk as a working copy:

Migrating to a sensible repository layout

Review current structureCurrent working copy held at /home/mark/Documents/research
This was originally checked out of my (locally held) repository with the command: svn co file:///home/mark/Repos/research
Create the “trunk” and “tags” subdirectoriessvn mkdir trunk svn mkdir tags
Move all working files into the trunksvn mv research.bib trunk/ svn mv data trunk/ svn mv thesis trunk/ ... etc..
Commit your changes (with an appropriate comment)
svn commit -m "Created a sensible repository layout"
Move the old working-copy somewhere out of the way
Here I just rename my working-copy folder for safety, although really with subversion all files are securely backed up in the repository, and I could easily check out my last revision before any of these changes were made!
cd .. mv research research-old
Check out the new trunk back in the old locationsvn checkout file:///home/mark/Repos/research/trunk research

This time, I provide a label for the checkout command (“research”) so that the trunk is checked out to an appropriately named directory (and not the default, which would be “trunk”).
Create a “tag” of this current trunkWe don’t need to have a working copy for this at all. One uses the svn copy command to “copy” the trunk as a tag:
svn copy file:///home/mark/Repos/research/trunk file:///home/mark/Repos/research/tags/paper_for_publication -m "Tagging submitted paper to..."

One can use any name for a “tag”. Software developers would commonly use a version number (e.g., V1.2), but in research this is not usually appropriate. Instead use an informative name. It is straightforward to delete tags in the future (although they can always be restored from the archive).

Tags: Free · Open Source · Research · Software

4 responses so far ↓

1 Gustavo // Sep 9, 2009 at 5:11 pm

Hi Mark, interesting post. Here I give my thoughts on directory organization: http://thegsharp.wordpress.com/2009/09/05/structuring-your-source-code/

Regards!
2 George Powell // Jun 4, 2011 at 12:55 am

Hi Mark, I came across your article by chance and thought as an addition you might find uberSVN a nice simple alternative instead of getting into the hard core command line stuff. uberSVN was developed by my company WANdisco as a free product to make Suvbersion a simple tool to install and maintain. Also the future of this product has a great deal of promise. If you have the time its free to download and test yourself at http://www.ubersvn.com/
3 Peter // Jul 28, 2011 at 3:03 pm

Hey Mark,

Great post. A couple of years ago I also started my research career with a SVN server for all my data. Heck, I thought it was so important that I even made a presentation to my peers about it. However, they couldn’t find the utility of it, and I soon abandoned it out of lazyness and the lack of need for it. I think that one of the issues is that people won’t think of these problems (e.g. needing to reconstruct results from a paper) until it’s usually too late, so I’d rather spend a little more time now than later. In addition to the structure itself, we need more stories of people wishing they had SVN *after* it was too late.

Peter
4 Ryan Sullivan // Jan 4, 2012 at 3:30 pm

I was reading about various SVN directory structures and came across your article. As a developer I look for tools that make all things easier. I also love client server and distributed systems. With that I thought I’d share with your community that svn works VERY well as a client/server system. As a server I recommend Colabnet Subversion Edge (free, comes with a web server, and is stupid easy to install and set up). For a client I recommend TortoiseSVN (free, explorer integration, easy to use). For apple there is an app called simply Versions (not free, easy to use, very pretty).

I also recommend you check out GIT. It offers a distributed version tracking system that can be more flexible, but also more confusing. Associated to GIT is GITHub.

Good luck!

(Don't forget to fill in the Captcha)

medicalnerds.com

technology, stats and IT for medics