We store raw data for two main reasons:
a)  We currently use only a fraction of the information actually contained in 
raw images and extraction of that fraction can be improved. Destroying the data 
means 
- we lose the extra information, and make future research in some areas either 
impossible or more costly
- we make it more difficult to improve current data reduction methods
b)  Raw data is the best way to independently validate a published structure 
and prevent fraud.

The majority of crystallographers already recognize these truths. That is why 
almost all of them do keep backups of their data even after structures have 
been published.   

To those still against making data public I would ask a simple question:  Would 
you object to providing the raw data from a published structure if such data 
were available and you did not have to bear an unreasonable inconvenience in 
the process? My guess is that most crystallographers are reasonable scientists 
and such a "Poll" will probably result in ~100% "Yes" and ~0% "No". I'm I wrong?

The real issue then is how do we make the data available in such a way that the 
inconvenience (if any) to all the stake-holders is reasonable.  Some great 
ideas have already been advanced. 

In the short-term,  we could start by using the fact that synchrotron 
facilities already store raw data for a period. However, a lot of data is 
collected which is not published. Given the limited disk space, it may be 
useful to know exactly which datasets result in a publication and should be 
kept for an extended period. If a unique ID (such as the DOI suggestion) is 
provided to every dataset and required during deposition/publication, then 
synchrotron facilities can preserve only those datasets which have been 
published after a given "grace" period. Combined with a central Meta-data 
server similar to TARDIS, such a system could be developed in a relatively 
short period of time, while longer term central storage ideas are worked out.

Again the best solution is going to be one which requires the least amount of 
effort from crystallographers. In fact, I can see a system in which the 
experiment metadata for a PDB entry/dataset comes directly from the synchrotron 
facility during deposition so that users simply provide a unique dataset ID and 
the experimental details are pre-filled for them.

Of course the above completely ignores home sources.


/Michel
> -----Original Message-----
> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of D
> Bonsor
> Sent: October-27-11 3:10 PM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] raw data deposition
> 
> Why should we store images?
> 
> From most of the posts it seems to aid in software development. If that is
> the case, there should be a Failed Protein Databank (FPDB) where people
> could upload datasets which they cannot solve. This would aid software
> development and allow someone else to have ago at solving the structure.
> 
> If it is for historical reasons, how can someone decide whether their
> structure is historical? I would propose that images should be uploaded for a
> protein or protein-complex that has never be solved before. That way the
> images are there if that structure does become historical.
> 
> The question is not whether or not images should be uploaded but who
> would use the images that were uploaded.
> 
> For example, people who use crystallography as a tool to aid in
> characterization of their protein, would probably not look at images for 99.5%
> of other protein datasets, and they probably would not look at images for a
> protein that is related to their own protein. They are more interested in the
> final structure. I too would probably not be interested in reprocessing and
> solving a structure again when I can easily access the final product already.

Reply via email to