Medical Nerds Blog Logo

technology, stats and IT for medics


Port forwarding with SSH/Putty

March 27th, 2007 by Mark · 13 Comments

I regularly exchange data between computers at University and home. To maintain security, I keep a firewall running on all machines, and “tunnel” through the firewall(s) using SSH – the secure shell. For example, I run a web server on my main machine for web application development, and do not wish this to be publicly accessible. My home computer is protected by a hardware firewall, and I use SSH to tunnel access to the web server. Local and remote port forwarding is straightforward, but it can be difficult to understand initially. I have therefore created a list of “recipes” that one can try… [Read more →]

→ 13 CommentsTags: Free · Open Source · Software

How to use JabRef (BibTeX) with Microsoft Word 2003

March 25th, 2007 by James · 264 Comments

JabRefJabRef is one of the best reference managers available and provides a realistic alternative to Endnote, as well as being open-source & free. Unfortunately most users are not aware that JabRef (or any other BibTeX based reference manager) can easily be integrated for use with Microsoft Word. In this guide I will show you step-by-step how to install Mike Brookes‘ excellent free Bibtex4Word (v1.12) Word Macro Package on your Windows XP machine. [Read more →]

→ 264 CommentsTags: Free · Software

Batch converting PDF to JPG/JPEG using free software

March 21st, 2007 by Mark · 40 Comments

imagemagick.pngIt is often necessary to batch convert PDF documents and graphics into other formats. I explain how to do this using totally free software. Searching for PDF software using Google is fraught with difficulty — one ends up with endless links to commercial sites, who charge lots of money, mislead users into paying for software that is similar to, or even uses free software. Freely available PDF software includes xpdf and ghostscript, and source code is fully available under a GNU GPL open source license. [Read more →]

→ 40 CommentsTags: Free · Graphics · LaTeX · Open Source · Software

Dovecot IMAP server, Debian Linux and tcpd’s hosts.allow

March 17th, 2007 by Mark · No Comments

Running your own email server is great, but it must be secure against attacks from hackers and “script-kiddies”, idiots who scan networks looking for systems that advertise services and allow remote access. You can secure certain services on your linux-based machine, such as sshd and imapd using tcpd’s hosts.allow and hosts.deny functionality, to limit the number of hosts that can even get to a login prompt. [Read more →]

→ No CommentsTags: Free · Linux · Open Source · Software

10 killer free apps for the Medical Research Student (Windows) 1/2

March 10th, 2007 by James · No Comments

Before you start out with your research project you need to be equipped with the tools required to help you keep notes, write papers and produce professional looking documents. In this 2 part article I will describe 10 of my favourite applications, most of which are open source, all are free. [Read more →]

→ No CommentsTags: Free · Open Source · Research · Software

10 pieces of free software every doctor should haveAn introduction to R

March 9th, 2007 by Mark · 2 Comments

If you have the credentials to view February’s 2007 issue of The Lancet, have a look at a published letter about the “ten pieces of free software every doctor should have”. If you don’t, then don’t worry too much – you’re not missing much with this article. I should have guessed that something wasn’t right when I saw that it was written by gynaecologists; that is always a bad sign. I suggest you read James’  series of highlights of decent free software instead for Windows instead.

Let me list their “best”. Note, they’ve restricted themselves to the Windows platform, which is a shame. I’m surprised this letter got published.

  • Yahoo desktop search
  • Foxit reader
  • Cute PDF writer
  • PDF blender
  • DeskPins
  • ScreenHunter Free
  • FastStone Image Viewer
  • Syncback
  • JustZipIt
  • YouSendIt

Oh I can see their reasoning. I’m sure that they’re fine little applications in their own way, and are useful to some. Desktop search is great (but Apple’s built-in “spotlight” is better), and Cute PDF writer is handy on Windows (but not needed in Mac OS X, as Print-to-PDF is built-in) but PDF creator is much better.

I would like to recommend some real free software. Powerful software that really is free. You’re free to download, free to modify and free to adapt this software. Maybe that isn’t important to you, but it should be. It means that once a free software project has gathered enough momentum, it will always be available. When I talk about “free software”, I’m referring to the FSF’s definition:

Free software is software that comes with permission for anyone to use, copy, and distribute, either verbatim or with modifications, either gratis or for a fee. In particular, this means that source code must be available. “If it’s not source, it’s not software.”

I’m not going to list ten pieces of free software, but here is my list of essential and truly free software that often surpass their commercial rivals in functionality:

  1. GNU/Linux (try downloading a “run from CD” version – no complex installation, free and powerful – try it instead of Microsoft Windows – why pay to upgrade to Vista?)
  2. R Project for Statistical Computing (it really is better than SPSS!)
  3. PostgreSQL (a superb database)
  4. Apache web server
  5. LaTeX/TeX
  6. Subversion
  7. Jabref

However, the real point of this article, when I consider free software and its relevance to medicine and researchers is R. I really want to introduce medics and other researchers to R.

There is considerable inertia within departments and universities, and the choice of statistical software is often limited. Here, the status quo appears to be SPSS. There will be departments (especially statistics/epidemiology) who use other programs on a need basis – functionality required that is only available in certain packages. However, there is now a free, open-source statistical program called R, and over the next few years is likely to be increasingly popular. There are already signs of a seismic shift in the way professional statisticians are using R, many drawn by its many advantages:

  • It is open-source – it is free.This is not just about cost. This means that all the inner-workings can be perused at leisure. The underlying statistical algorithms can be seen, and are not hidden behind proprietary interfaces. For professional statisticians, this is important. Mere mortals merely use standard statistical techniques, but R is often on the cutting edge, and for those working in these fields, being able to review the underlying algorithm is important.
  • R is available for many different operating systems.It’s written in a portable manner and can be compiled for most modern operating systems. This means it won’t stop working when you upgrade systems (unlike my installation of SPSS which stopped working when I upgraded from Mac OS X Panther to Tiger).
  • R isn’t going to go away. It’s a working system right now. There is considerable momentum behind it, with a large core team and hundreds of volunteers contributing add-on packages to implement common and rare statistical techniques. It wouldn’t matter if development stopped right now (it isn’t going to though); you will still be able to re-run those old analyses in many years to come.
  • R is an open source implementation of S. This is a programming language that provides a powerful environment for manipulating data and implementing statistical techniques. That means that many of the statistical methods built-in to R (and provided in the free add-on packages) are written in this same code. There is little distinction between users and developers of the program – and in fact, S is increasingly used as the language of choice for the development of new statistical methodologies – as reviewing any statistical journals will prove.
  • Point and click statistical packages are highly limiting, difficult to learn and provide a fragmented non-standard view of statistical methodology. Programs such as SPSS and SAS provide a macro language of sorts, but it is not a traditional programming language, and does not provide a rich environment for developing or using statistical techniques. They have been developed with no systematic design, and have accumulated new functionality in an incremental, accumulative way, and often this functionality is provided in a narrow, task-specific manner. They highlight the differences between common techniques (e.g., ANOVA vs.R does involve typing commands at the keyboard, but R provides a systematic, cohesive environment for statistical analysis.I should quote from the R website:

    R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

    Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.

  • R is a flexible environment in which a variety of tools can be provided.In the 1980s and 1990s, there were huge developments in bioinformatics. Many modern genetic algorithms are underpinned by statistic techniques, and a large number of standalone programs were written, often in C or fortran to implement these techniques. The problem with standalone software is they have widely different interfaces – how to feed data in, and get results back. R provides a generic environment for these kind of tools, and feeding data in and obtaining results from simple, or complex algorithms is straightforward. Bioconductor provides a large framework for bioinformatics and genetic analyses, and is under active and current development.
  • R focuses on data and data manipulation. One may import data from a variety of sources: excel, SQL-based databases such as MySQL, PostgreSQL, SQL server (and even Microsoft Access if one can call it a SQL database), and from webpages (eg real-time economic data). There is no limit on the type and format of data manipulation, and as R is a complete programming environment, one can write functions in the language itself.
  • R supports a bewildering number of graphical options. One can specify default parameters and get simple professional looking plots, or fine-tune every parameter of drawing to create arbitrarily complex graphs and diagrams.

→ 2 CommentsTags: Free · R statistical computing

R and Filemaker on Mac OS X

March 1st, 2007 by Mark · 4 Comments

R (The R project for statistical computing) and Filemaker provide a compelling solution for the common problems involved in medical research, namely data entry, reporting and analysis. However, there are pecularities in this combination, particularly when running on Mac OS X and trying to use ODBC, that can cause difficulties. This article discusses some solutions to these problems. [Read more →]

→ 4 CommentsTags: Filemaker · ODBC · R statistical computing

Editing blog posts with TextMate

February 27th, 2007 by Mark · No Comments

If you use TextMate on Mac OS X (and if you don’t I would thoroughly recommend you try it: it is by far the best editor I have ever used) and maintain a blog, then it is possible to create blog entries using TextMate.
[

→ No CommentsTags: Software

How to add a Free Medical Spell Checking Dictionary to Word/Firefox

February 26th, 2007 by James · 4 Comments

Spell checking medical documents can be a real pain. My spelling at the best of times is atrocoius atrocious. Luckily most software, even browsers now support spell checking. The problem is now getting hold of a good medical dictionary to use with your application. Now a site which specialises in medical spell checking, medical proformas and PDA software have made a 40 000 word dictionary available for free under the GPL licence. Up until their product for MS word was only available with a $10 single user licence. [Read more →]

→ 4 CommentsTags: Free · Medical · Open Source · Software

Subversion directory organisation

February 26th, 2007 by Mark · 4 Comments

Using subversion (or any other version-control system) to manage your working laboratory or research files is sensible. All changes can be tracked, and it is straightforward to review old versions of files. I store all work relating to research, including notes, papers, thesis chapters, statistical analyses and even data. If I were to make catastrophic changes (deliberately or non-deliberately) it is easy to roll-back changes. It’s like a “Track Changes” on steroids. [Read more →]

→ 4 CommentsTags: Free · Open Source · Research · Software