On Tuesday, November 16, 2010, James Stroud wrote: > I was reading the PNAS author guidelines and I came across this gem: > > Datasets: Supply Excel (.xls), RTF, or PDF files. This file type will be > published in raw format and will not be edited or composed. > > Did I read those last two file formats correctly? I have actually came across > a dataset in supplementary information that was several dozen pages of PDF. > It was effectively impossible to extract the data from this document. (I can > dig it up if pressed, probably.) I had no idea that the authors may have been > encouraged to submit their data like that. > > Does a premiere scientific journal actually request data to be in PDF format?
Why not? I'm pretty sure it is more universally readable than the other two formats. True, it's designed for human readers rather than as a data input format, but that seems reasonable for most [though certainly not all] publication supplements. > > I can think of dozens of other formats that would be more fitting. They are > summarized here: > > http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats Bleah. Virtually none of those are human-readable, no matter what the wikipedia page may choose to put as a heading title. What kind of data are you dealing with? PDF would indeed be an odd format for diffraction images, but it would be miles better than most of the formats on the list you point to. Ethan > What is the scholarly equivalent to a torch and pitchfork march and how can > we organize such a march to encourage journals to require proper > serialization formats for datasets in supplementary info? > > James > > P.S. I am aware that it is better to submit data to a dedicated repository, > but let's consider those cases where research produces data for which there > is not yet a dedicated repository. > >