Dear Colleagues, In trying to perhaps see some level of virtue in the PNAS approach one can imagine that not all deposited data can be well characterised in a way that is easy for computers to parse automatically. In such circumstances, a deposited PDF may be better than nothing at all. As yet, not all journal publishing platforms can or will serve a variety of different file formats, which is probably in part why PDFs might be used, since they are easy to generate.
That said I agree with previous postings today that Journals should encourage authors to supply data in well-characterised machine-readable formats ie to the extent that this is feasible. For small molecule crystal structures within IUCr Journal articles, and associated crystal structure data sets, this is straightforward, since variants of the IUCr's CIF standard cover diffraction images, structure factors and refined coordinates and ADPs. For protein crystal structures, as this CCP4bb well knows, articles are accompanied by RCSB deposition of coordinates and structure factors. Nevertheless, it would be good to see research scientists increasing pressure on journals to deposit and disseminate supplementary data in machine-readable formats, since that would in the long run greatly increase the value of the deposited material. An open-access paper I recently published with a colleague from the IUCr office discusses the importance of fully integrating experimental data with the finished research analysis, to complete the scientific record. See: Helliwell, J. R. & McMahon, B. (2010) The record of experimental science: archiving data with literature. Information Services and Use 30, 31-37; DOI: 10.3233/ISU-2010-0609. Many of the things we discuss in that article are equally relevant to supplementary information as discussed in this thread. Yours sincerely, John Professor John R Helliwell DSc On Wed, Nov 17, 2010 at 6:39 AM, James Stroud <xtald...@gmail.com> wrote: > I was reading the PNAS author guidelines and I came across this gem: > > Datasets: Supply Excel (.xls), RTF, or PDF files. This file type will be > published in raw format and will not be edited or composed. > > Did I read those last two file formats correctly? I have actually came > across a dataset in supplementary information that was several dozen pages > of PDF. It was effectively impossible to extract the data from this > document. (I can dig it up if pressed, probably.) I had no idea that the > authors may have been encouraged to submit their data like that. > Does a premiere scientific journal actually request data to be in PDF > format? > I can think of dozens of other formats that would be more fitting. They are > summarized here: > > http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats > > What is the scholarly equivalent to a torch and pitchfork march and how can > we organize such a march to encourage journals to require proper > serialization formats for datasets in supplementary info? > James > P.S. I am aware that it is better to submit data to a dedicated repository, > but let's consider those cases where research produces data for which there > is not yet a dedicated repository. >