It is often necessary to batch convert PDF documents and graphics into other formats. I explain how to do this using totally free software. Searching for PDF software using Google is fraught with difficulty — one ends up with endless links to commercial sites, who charge lots of money, mislead users into paying for software that is similar to, or even uses free software. Freely available PDF software includes xpdf and ghostscript, and source code is fully available under a GNU GPL open source license.
I generate all my research graphics, charts, tables etc. using R. My semi-automated system for generating these using complex R scripts to fetch the latest data from PostgreSQL, perform analysis, and then generates PDF and Postscript files for inclusion into LaTeX documents. PDF is a great format, fully supported on the Mac, but Microsoft Word and Microsoft Powerpoint do not support PDF properly. This makes things very difficult, and while I prefer to use Apple’s Keynote program, presenting at scientific meetings tends to require Powerpoint.
Vector graphics are different to bitmap graphics, and tend to be smaller, and scale to both small and large sizes without becoming “jaggy”. Unfortunately, the standard vector file format for Microsoft applications is WMF (Windows Metafile), a proprietary and poorly documented standard, which means it is poorly supported by other operating systems, like Mac OS X and Linux. Even Microsoft’s own software on Mac OS X does not support WMF files properly, and often has difficulty importing documents using this format.
In an ideal world, Microsoft Powerpoint would support PDF graphics easily, but until it does, one needs to convert the files manually.
The best way at present appears to be to convert them to high-quality bitmap images. This involves rasterisation, and so does degrade quality, but I use a high “dots-per-inch” (DPI) when plotting, so that quality is maintained. I would recommend either 300 or 600.
Installation on Microsoft Windows and Linux is easy. Either use your standard package manager, or download a binary from the ImageMagick website. ImageMagick actually uses ghostscript to do the rasterisation of the vector graphics, but provides a simpler user interface. For Mac OS X, I use MacPorts:
port install ImageMagick
To convert one file:
convert -density 300 file.pdf file.jpg
To convert more than one file, one can use wildcards, such as
*.jpg, but I prefer to use bash scripting (the default shell in Mac OS X) to batch convert files as it can preserve output filenames:
for i in `ls *.pdf`; do convert -density 300 "$i" "$i".jpg; done
And that’s it. You should end up with a directory of converted PDFs, suitable for inclusion into any Microsoft application!
ImageMagick supports many other output file formats, including PNG, so try it out!
Update: June 2010
See a more recent post about a better way of converting multiple image files using mogrify.