Hi Eleanor, So far I have managed to "lurk" on this one - keeping an eye on things but not getting involved. However this has prompted me to respond!
> Has anyone raided the point that while archiving is good, it will only be > generally useful if the image HEADERS are informative and use a > comprehensible format - and the data base is documented... There are a number of issues here: - whether to publish data - how to publish data - how to make the published data useful - whether to centrally archive that data - whether to standardise the data, and if so how - who should pay - moving the data around etc. The image headers issue is clearly one of these, however properly resolving this has thus far proved to be if not intractable, at least challenging. There is at least one "standard" comprehensible format which is out there and has been for a while, but is thus far lacking widespread adoption. However, and this is a huge however, we really need to ask how much effort it is worth putting into (handling) each data set. The more effort is needed to make the data available, the less likely it is that it will ever become available. I know from experience that even when people want you to have data the activation energy is substantial. Making the process more complex will decrease the likelihood of it occurring. If on the other hand we lower this barrier (i.e. you make available the data you have on the hard disk exactly how it is) you lower this barrier some way. We can make this "useful" by also including the processing logs - i.e. your mosflm / denzo / XDS log file - so that anyone who really wants to know can look and figure it out. This biases the effort in the right direction. Even if every data set was perfectly published, it is pretty unlikely that any given data set would be re-analysed - unless it is really interesting. If it is really interesting, it is then worth the effort to figure out the parameters, so make this possible if inconvenient. As I see it, a main benefit of this is to allow that occasional questionable structure to be looked at really hard - and simply the need to upload the original data and processing results would help in reducing the possibility of depositing anything "fake". Another factor I am painfully aware of is that disks die - certainly mine do. This is all well and good - however the time it takes to move 4TB of data from one drive to another is surprisingly long, as even with things like eSATA we have oceans of data connected by hosepipes. Moving all of your raw data from a visit to the synchrotron (which could easily be TB's) home is a challenge enough - subsequently moving it to some central archive could be a real killer. Equally making the data public from your own lab is also difficult for many. At least facility sites are equipped to meet some of these challenges. So - as I can see it we have much bigger fish to fry than getting the headers for historical data standardised! Current and future data, well that's a different pot of worms. And that's from someone who really does care about image headers ;o) Cheerio, Graeme