Medical Nerds Blog Logo

technology, stats and IT for medics


10 pieces of free software every doctor should haveAn introduction to R

March 9th, 2007 by Mark · 2 Comments

If you have the credentials to view February’s 2007 issue of The Lancet, have a look at a published letter about the “ten pieces of free software every doctor should have”. If you don’t, then don’t worry too much – you’re not missing much with this article. I should have guessed that something wasn’t right when I saw that it was written by gynaecologists; that is always a bad sign. I suggest you read James’  series of highlights of decent free software instead for Windows instead.

Let me list their “best”. Note, they’ve restricted themselves to the Windows platform, which is a shame. I’m surprised this letter got published.

  • Yahoo desktop search
  • Foxit reader
  • Cute PDF writer
  • PDF blender
  • DeskPins
  • ScreenHunter Free
  • FastStone Image Viewer
  • Syncback
  • JustZipIt
  • YouSendIt

Oh I can see their reasoning. I’m sure that they’re fine little applications in their own way, and are useful to some. Desktop search is great (but Apple’s built-in “spotlight” is better), and Cute PDF writer is handy on Windows (but not needed in Mac OS X, as Print-to-PDF is built-in) but PDF creator is much better.

I would like to recommend some real free software. Powerful software that really is free. You’re free to download, free to modify and free to adapt this software. Maybe that isn’t important to you, but it should be. It means that once a free software project has gathered enough momentum, it will always be available. When I talk about “free software”, I’m referring to the FSF’s definition:

Free software is software that comes with permission for anyone to use, copy, and distribute, either verbatim or with modifications, either gratis or for a fee. In particular, this means that source code must be available. “If it’s not source, it’s not software.”

I’m not going to list ten pieces of free software, but here is my list of essential and truly free software that often surpass their commercial rivals in functionality:

  1. GNU/Linux (try downloading a “run from CD” version – no complex installation, free and powerful – try it instead of Microsoft Windows – why pay to upgrade to Vista?)
  2. R Project for Statistical Computing (it really is better than SPSS!)
  3. PostgreSQL (a superb database)
  4. Apache web server
  5. LaTeX/TeX
  6. Subversion
  7. Jabref

However, the real point of this article, when I consider free software and its relevance to medicine and researchers is R. I really want to introduce medics and other researchers to R.

There is considerable inertia within departments and universities, and the choice of statistical software is often limited. Here, the status quo appears to be SPSS. There will be departments (especially statistics/epidemiology) who use other programs on a need basis – functionality required that is only available in certain packages. However, there is now a free, open-source statistical program called R, and over the next few years is likely to be increasingly popular. There are already signs of a seismic shift in the way professional statisticians are using R, many drawn by its many advantages:

  • It is open-source – it is free.This is not just about cost. This means that all the inner-workings can be perused at leisure. The underlying statistical algorithms can be seen, and are not hidden behind proprietary interfaces. For professional statisticians, this is important. Mere mortals merely use standard statistical techniques, but R is often on the cutting edge, and for those working in these fields, being able to review the underlying algorithm is important.
  • R is available for many different operating systems.It’s written in a portable manner and can be compiled for most modern operating systems. This means it won’t stop working when you upgrade systems (unlike my installation of SPSS which stopped working when I upgraded from Mac OS X Panther to Tiger).
  • R isn’t going to go away. It’s a working system right now. There is considerable momentum behind it, with a large core team and hundreds of volunteers contributing add-on packages to implement common and rare statistical techniques. It wouldn’t matter if development stopped right now (it isn’t going to though); you will still be able to re-run those old analyses in many years to come.
  • R is an open source implementation of S. This is a programming language that provides a powerful environment for manipulating data and implementing statistical techniques. That means that many of the statistical methods built-in to R (and provided in the free add-on packages) are written in this same code. There is little distinction between users and developers of the program – and in fact, S is increasingly used as the language of choice for the development of new statistical methodologies – as reviewing any statistical journals will prove.
  • Point and click statistical packages are highly limiting, difficult to learn and provide a fragmented non-standard view of statistical methodology. Programs such as SPSS and SAS provide a macro language of sorts, but it is not a traditional programming language, and does not provide a rich environment for developing or using statistical techniques. They have been developed with no systematic design, and have accumulated new functionality in an incremental, accumulative way, and often this functionality is provided in a narrow, task-specific manner. They highlight the differences between common techniques (e.g., ANOVA vs.R does involve typing commands at the keyboard, but R provides a systematic, cohesive environment for statistical analysis.I should quote from the R website:

    R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

    Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.

  • R is a flexible environment in which a variety of tools can be provided.In the 1980s and 1990s, there were huge developments in bioinformatics. Many modern genetic algorithms are underpinned by statistic techniques, and a large number of standalone programs were written, often in C or fortran to implement these techniques. The problem with standalone software is they have widely different interfaces – how to feed data in, and get results back. R provides a generic environment for these kind of tools, and feeding data in and obtaining results from simple, or complex algorithms is straightforward. Bioconductor provides a large framework for bioinformatics and genetic analyses, and is under active and current development.
  • R focuses on data and data manipulation. One may import data from a variety of sources: excel, SQL-based databases such as MySQL, PostgreSQL, SQL server (and even Microsoft Access if one can call it a SQL database), and from webpages (eg real-time economic data). There is no limit on the type and format of data manipulation, and as R is a complete programming environment, one can write functions in the language itself.
  • R supports a bewildering number of graphical options. One can specify default parameters and get simple professional looking plots, or fine-tune every parameter of drawing to create arbitrarily complex graphs and diagrams.

Tags: Free · R statistical computing

2 responses so far ↓

  • 1 Oren // Apr 9, 2010 at 7:16 pm

    I know this post is old (i hope you get notifications)

    – Do you use RDBMS for entering and editing data, and move to R afterwards? are you using R at all for data editing?
    – To your knowledge, how wide spread is R around physicians?

  • 2 robin beaumont // Mar 24, 2011 at 3:30 pm

    You may find useful for your collegues 2 statistics courses I have written for medics and dentists, the first one starts very basically (ends using the free package Gpower to work out sample size requirements for trials), while the second starts with multiple regression and logistic regression and then finishes with using mixed models to analyse hierarchical data structures (e.g. teeth in mouth, dentists/gps in dental practices etc) and repeated measures.
    I have used both SPSS and more importantly R along with a free graphical interface for R called R commander to carry out all the analyses, I also demonstrate all the analyses via YouTube videos (about 40 so far – probably be around 60 when completed). The accompanying material also contains exercises and numerous MCQs. (all is freely available)
    Please pass onto any colleagues you think may find them useful. The course is part of a MSc in health informatics run jointly by the RCSed and edin univ. I run several modules on it.

    Any comments/ feedback welcome.
    Two statistics course for dentists and medics with supporting YouTube videos:
    course 2 advanced statistics (survival analysis, Cox regression, multiple regression, ANCOVA, allowing for analysis hierarchical data structures, teeth within mouth, doctors within centres etc) still in the process of being written (i.e. the repeated measures and multilevel modelling stuff)
    Youtube videos to accompany course 2 at:
    1 basic statistics (descriptive statistics, comparing two groups, working out the number of cases you need to have a chance of getting a significant result, Fisher Pearson v Neyman/Pearson interpretation of P values etc)
    Youtube videos to accompany course 1 at:

Leave a Comment

(Don't forget to fill in the Captcha)