We store raw data for two main reasons: a) We currently use only a fraction of the information actually contained in raw images and extraction of that fraction can be improved. Destroying the data means - we lose the extra information, and make future research in some areas either impossible or more costly - we make it more difficult to improve current data reduction methods b) Raw data is the best way to independently validate a published structure and prevent fraud.
The majority of crystallographers already recognize these truths. That is why almost all of them do keep backups of their data even after structures have been published. To those still against making data public I would ask a simple question: Would you object to providing the raw data from a published structure if such data were available and you did not have to bear an unreasonable inconvenience in the process? My guess is that most crystallographers are reasonable scientists and such a "Poll" will probably result in ~100% "Yes" and ~0% "No". I'm I wrong? The real issue then is how do we make the data available in such a way that the inconvenience (if any) to all the stake-holders is reasonable. Some great ideas have already been advanced. In the short-term, we could start by using the fact that synchrotron facilities already store raw data for a period. However, a lot of data is collected which is not published. Given the limited disk space, it may be useful to know exactly which datasets result in a publication and should be kept for an extended period. If a unique ID (such as the DOI suggestion) is provided to every dataset and required during deposition/publication, then synchrotron facilities can preserve only those datasets which have been published after a given "grace" period. Combined with a central Meta-data server similar to TARDIS, such a system could be developed in a relatively short period of time, while longer term central storage ideas are worked out. Again the best solution is going to be one which requires the least amount of effort from crystallographers. In fact, I can see a system in which the experiment metadata for a PDB entry/dataset comes directly from the synchrotron facility during deposition so that users simply provide a unique dataset ID and the experimental details are pre-filled for them. Of course the above completely ignores home sources. /Michel > -----Original Message----- > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of D > Bonsor > Sent: October-27-11 3:10 PM > To: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] raw data deposition > > Why should we store images? > > From most of the posts it seems to aid in software development. If that is > the case, there should be a Failed Protein Databank (FPDB) where people > could upload datasets which they cannot solve. This would aid software > development and allow someone else to have ago at solving the structure. > > If it is for historical reasons, how can someone decide whether their > structure is historical? I would propose that images should be uploaded for a > protein or protein-complex that has never be solved before. That way the > images are there if that structure does become historical. > > The question is not whether or not images should be uploaded but who > would use the images that were uploaded. > > For example, people who use crystallography as a tool to aid in > characterization of their protein, would probably not look at images for 99.5% > of other protein datasets, and they probably would not look at images for a > protein that is related to their own protein. They are more interested in the > final structure. I too would probably not be interested in reprocessing and > solving a structure again when I can easily access the final product already.