Hi Eleanor,

So far I have managed to "lurk" on this one - keeping an eye on things
but not getting involved. However this has prompted me to respond!

> Has anyone raided the point that while archiving is good, it will only be
> generally  useful if the image HEADERS are informative and use a
> comprehensible format - and the data base is documented...

There are a number of issues here:

 - whether to publish data
 - how to publish data
 - how to make the published data useful
 - whether to centrally archive that data
 - whether to standardise the data, and if so how
 - who should pay
 - moving the data around

etc. The image headers issue is clearly one of these, however properly
resolving this has thus far proved to be if not intractable, at least
challenging. There is at least one "standard" comprehensible format
which is out there and has been for a while, but is thus far lacking
widespread adoption. However, and this is a huge however, we really
need to ask how much effort it is worth putting into (handling) each
data set. The more effort is needed to make the data available, the
less likely it is that it will ever become available. I know from
experience that even when people want you to have data the activation
energy is substantial. Making the process more complex will decrease
the likelihood of it occurring. If on the other hand we lower this
barrier (i.e. you make available the data you have on the hard disk
exactly how it is) you lower this barrier some way. We can make this
"useful" by also including the processing logs - i.e. your mosflm /
denzo / XDS log file - so that anyone who really wants to know can
look and figure it out.

This biases the effort in the right direction. Even if every data set
was perfectly published, it is pretty unlikely that any given data set
would be re-analysed - unless it is really interesting. If it is
really interesting, it is then worth the effort to figure out the
parameters, so make this possible if inconvenient. As I see it, a main
benefit of this is to allow that occasional questionable structure to
be looked at really hard - and simply the need to upload the original
data and processing results would help in reducing the possibility of
depositing anything "fake".

Another factor I am painfully aware of is that disks die - certainly
mine do. This is all well and good - however the time it takes to move
4TB of data from one drive to another is surprisingly long, as even
with things like eSATA we have oceans of data connected by hosepipes.
Moving all of your raw data from a visit to the synchrotron (which
could easily be TB's) home is a challenge enough - subsequently moving
it to some central archive could be a real killer. Equally making the
data public from your own lab is also difficult for many. At least
facility sites are equipped to meet some of these challenges.

So - as I can see it we have much bigger fish to fry than getting the
headers for historical data standardised! Current and future data,
well that's a different pot of worms. And that's from someone who
really does care about image headers ;o)

Cheerio,

Graeme

Reply via email to