given that: - storage is becoming cheaper exponentially, - computer power is increasing exponentially, I think there is no reason to not store all images used for solving a structure - linked to the pdb entry and properly annotated with beam centre, lambda, pixel order and all other necessary processing parameters.
Possible uses: - programmers of data processing (and experimental phasing) software can test their programs routinely on many sets of images instead of just a few, to better judge the result of changes they have made to the code. - the PDB (or someone else), can periodically reprocess all the data with the then state-of-the-art software, remodel where necessary and re-refine all the structures, and thus periodically improve all the structures in the pdb. This does not mean the original authors did a bad job, just that technical improvements inevitably will allow doing a better one in the future. Sure, this does not lead directly to new biomedical insights, but indirectly it will. The improved data processing (and experimental phasing) programs will allow solving at least a few, new, structures that otherwise would not be solvable. The improved old structures in the pdb may allow a few, new, structures to be solved by MR that otherwise could not have been solved. The dataset of better structures will lead to better performance of structure prediction programs (better energy calculations etc), allowing better modelling. (I was, like Enrico, very sceptical of modelling, until the first modelled structure was successfully used in MR to solve a new structure. I now believe modelling will play an increasing role). I agree with Enrico that storing and annotating "failed" data collections will be difficult to enforce - I for one would rather concentrate on the data collections that did work in that trip (if any), or spend my time drowning sorrows, resting and then trying to get better crystals for the next trip, rather than filling in meta-data for a data storage facility, even if it is only a two-minute job. Another thing are the in-between cases, datasets collected, processed and for which structure solution has been tried but abandoned due to the ratio of difficulty / interest being to high (difficulties being irreproducibility of crystals, some pathology in the data, etc...). These data I would gladly submit for someone else to have a go. Currently, probably no-one would, but in the future, due to better software, understanding and possible a new structure in the pdb that could be used in MR, the difficulty may go down, and some-one might do it if the potential structure is still of interest. This some-one could be myself, and the images being stored safely and centrally would make it easier also for me to recover them. Mark J van Raaij Laboratorio M-4 Dpto de Estructura de Macromoleculas Centro Nacional de Biotecnologia - CSIC c/Darwin 3 E-28049 Madrid, Spain tel. (+34) 91 585 4616 http://www.cnb.csic.es/content/research/macromolecular/mvraaij On 18 Oct 2011, at 18:19, Enrico Stura wrote: > Dear Peter, > > How many crystallographers does it take to transform bad data into good data? > None, you need a modeller. Only a modeller can give you a structure with > perfect > geometry. Data just introduces experimental errors into what would otherwise > be a perfect > structure. > > If you have good data do you need crystallographers? > ... > > Of course there all the cases in between. That ... you are right, is the > other half of the story. > > From a biological point of view, only borderline cases make "cents" ($+€) to > store. > The experimenter in consultation with a beamline scientist at an SR facility > is the best > small commitee suitable to evaluate what is worth keeping. I am sure that the > images > that are worth storing for a long long time would fit on a few Tb at a > reasonable cost. > Storing everything would make it harder to find something worth improving in > the future. > > Enrico. > > > On Tue, 18 Oct 2011 17:12:42 +0200, Peter Keller <pkel...@globalphasing.com> > wrote: > >> Dear Enrico, >> >> Please don't get me wrong: what you are saying is not incorrect, but it >> is only half the story. >> >> On Tue, 2011-10-18 at 15:13 +0200, Enrico Stura wrote: >>> With improving techniques, we should always be making progress! >> >> Yes, of course! >> >>> If we are trying to answer a biological question that is really important, >>> we would be better off >>> improving the purification, the crystallization, the cryo-conditions >> >> You have left X-ray crystallography out of this list. It is a technique >> like the others, and can also be improved :-) >> >> It may be true that the number of crystallographers that are working on >> improving instrumental methodology and software is small compared to the >> number working on improving wet-lab techniques, but that number is not >> zero, and the contribution is significant. The rest of you benefit from >> that work! >> >>> instead of having to rely on >>> processing old images with new software. >>> >>> I have 10 years worth of images. I have reprocessed very few of them and >>> never made any >>> sensational progress using the new software. Poor diffraction is poor >>> diffraction. >> >> Maybe so, but certain types of datasets are useful for methods and >> software development, even if no new biological insights could be gained >> by reprocessing them. These datasets are often hard to get hold of in >> practice, especially when they are in someone's lab on a tape that >> no-one has a reader for any more. >> >> Obtaining protein, growing crystals and collecting new data in such a >> way that the interesting features of those datasets are reproduced can >> be much much harder than curating the images would be. This is >> especially true for software-oriented people like us who don't have >> regular access to wet-lab facilities. >> >>> Money can be better spent buying a wine cellar, storage works for wine. >> >> Images have already been lost that ought to have been kept. The >> questions are: how to select the datasets that are potentially of value, >> and how to make sure that they don't disappear. >> >> Regards, >> Peter. >> > > > -- > Enrico A. Stura D.Phil. (Oxon) , Tel: 33 (0)1 69 08 4302 Office > Room 19, Bat.152, Tel: 33 (0)1 69 08 9449 Lab > LTMB, SIMOPRO, IBiTec-S, CE Saclay, 91191 Gif-sur-Yvette, FRANCE > http://www-dsv.cea.fr/en/institutes/institute-of-biology-and-technology-saclay-ibitec-s/unites-de-recherche/department-of-molecular-engineering-of-proteins-simopro/molecular-toxinology-and-biotechnology-laboratory-ltmb/crystallogenesis-e.-stura > http://www.chem.gla.ac.uk/protein/mirror/stura/index2.html > e-mail: est...@cea.fr Fax: 33 (0)1 69 08 90 71