Dear Colleagues,
In trying to perhaps see some level of virtue in the PNAS approach one
can imagine that not all deposited data
can be well characterised in a way that is easy for computers to parse
automatically.
In such circumstances, a deposited PDF may be better than nothing at
all. As yet,
not all journal publishing platforms can or will serve a variety of different
file formats, which is probably in part why PDFs might be used, since
they are easy to generate.

That said I agree with previous postings today that Journals should
encourage authors
to supply data in well-characterised machine-readable formats ie to
the extent that this is
feasible.

For small molecule crystal structures within IUCr Journal articles,
and associated
crystal structure data sets, this is straightforward, since variants
of the IUCr's CIF standard
cover diffraction images, structure factors and refined coordinates
and ADPs. For protein crystal structures, as
this CCP4bb well knows, articles are accompanied by RCSB deposition of
coordinates and structure factors.


Nevertheless, it would be good to see research scientists increasing
pressure on journals to deposit and disseminate supplementary data in
machine-readable formats, since that would in the long run greatly
increase the value of the deposited material.

An open-access paper I recently published with a colleague from the
IUCr office discusses the importance of fully integrating experimental
data with the finished research analysis, to complete the scientific
record.  See:
    Helliwell, J. R. & McMahon, B. (2010) The record of experimental
    science: archiving data with literature. Information Services and Use 30,
    31-37; DOI: 10.3233/ISU-2010-0609.

Many of the things we discuss in that article are equally relevant to
supplementary information as discussed in this thread.

Yours sincerely,
John
Professor John R Helliwell DSc



On Wed, Nov 17, 2010 at 6:39 AM, James Stroud <xtald...@gmail.com> wrote:
> I was reading the PNAS author guidelines and I came across this gem:
>
> Datasets: Supply Excel (.xls), RTF, or PDF files. This file type will be
> published in raw format and will not be edited or composed.
>
> Did I read those last two file formats correctly? I have actually came
> across a dataset in supplementary information that was several dozen pages
> of PDF. It was effectively impossible to extract the data from this
> document. (I can dig it up if pressed, probably.) I had no idea that the
> authors may have been encouraged to submit their data like that.
> Does a premiere scientific journal actually request data to be in PDF
> format?
> I can think of dozens of other formats that would be more fitting. They are
> summarized here:
>
> http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
>
> What is the scholarly equivalent to a torch and pitchfork march and how can
> we organize such a march to encourage journals to require proper
> serialization formats for datasets in supplementary info?
> James
> P.S. I am aware that it is better to submit data to a dedicated repository,
> but let's consider those cases where research produces data for which there
> is not yet a dedicated repository.
>

Reply via email to