Phoebe,

Just automate the archiving and come up with a reasonable scheme how to. Ours 
is that data sets are called:

userid_yearmonth_projectid_#

Userid is derived from the login into CrystalClear (oops, free advertizing), 
projectid is set by the PI (so she can remember 10 years from now what in the 
world these data are all about) and the users are asked (threatened) to call 
their data sets "projectid_#" (and not the ubiquitous "test"). We have a script 
that automatically archives everything away from our data collection computer 
into an archive - activated by an icon on the desktop - and it adds the userid 
and date to the filename. This has the nice added advantage that the data 
collection disk stays clean. This only breaks when we collect synchrotron data 
(which is all the time) because our synchrotron remote scientist who collects 
the data cannot (should not) be threatened. :-) I then rename all data sets for 
archiving so the naming is consistent and you can actually make (say in pdf) an 
index of all the data you have, organized by user, date, or project. 

Our policy is that the PI decides if data should be maintained or if it really 
can go (no diffraction, really a test crystal to see that the crystal is in the 
beam etc). In practice this doesn't happen so someone else makes the decision. 
We tend to err on the side of caution. We tend to think that all results should 
be saved, unless it is blatantly obvious that there is no point. Storage is 
cheap (and cheaper every time you think of it).

After you automate in the previously agreed upon scheme, it is somewhat easier 
to find things back because if you can remember who collected it, or 
approximately when it was done, or what the project was, you can find it. The 
pain was up front: to come up with a scheme, to enable a rigorous naming 
convention and to implement it (data collection computer and archive are not 
physically on the same computer etc). 

Maybe the Committee is also thinking about that issue - how are you going to 
keep all the data manageable and searchable. Presumably by something like a PDB 
id (this seems to make sense for published/deposited structures) but for 
"things that did not make it to PDB" one would have to come up with another 
plan.

Mark

 

 

-----Original Message-----
From: Phoebe Rice <pr...@uchicago.edu>
To: CCP4BB <CCP4BB@JISCMAIL.AC.UK>
Sent: Tue, Oct 18, 2011 12:01 pm
Subject: Re: [ccp4bb] IUCr committees, depositing images


One more consideration:
Since organization is not one of my greatest talents, I would be absolutely 
delighted if a databank took over the burden of archiving my raw data for me.  
  Phoebe

=====================================
Phoebe A. Rice
Dept. of Biochemistry & Molecular Biology
The University of Chicago
phone 773 834 1723
http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123
http://www.rsc.org/shop/books/2008/9780854042722.asp


---- Original message ----
>Date: Tue, 18 Oct 2011 18:17:14 +0100
>From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> (on behalf of Gerard 
>Bricogne 
<g...@globalphasing.com>)
>Subject: Re: [ccp4bb] IUCr committees, depositing images  
>To: CCP4BB@JISCMAIL.AC.UK
>
>Dear Enrico, Frank and colleagues,
>
>     I am glad to have suggested that everyone's views on this issue should
>be aired out on this BB rather than sent off-list to an IUCr committee
>member: this is much more interactive and thought-provoking. 
>
>     There would seem to be clear biases in some of the positions - for
>instance, the statement that we overvalue individual structures and that
>there is value only in their ensemble has to be seen to be coming from
>someone in a structural genomics centre ;-) . However, as Wladek pointed
>out, when an investigator's project is crucially dependent on a result
>embodied in a deposited structure, it would be of the greatest value to that
>investigator to be able to double-check how reliable some features of that
>structure (especially its ligands) actually are.
>
>     On the other hand Enrico, as a specialist of crystallisation and
>modelling, sees value only in improving those contributors to the task of
>structure determination. This is forgetting (1) an essential capability of
>crystallography: that, through experimental phasing, it can show you what a
>protein looks like even if you have never seen nor modelled one before,
>through the wondrous process of producing model-free electron-density maps;
>and (2) an essential aspect of the task of structure determination: that it
>doesn't aim at producing a model with perfect geometry, but one that best
>explains the measured data and neither under- nor over-interprets them (I
>realise, though, that Enrico's statement "Data just introduces experimental
>errors into what would otherwise be a perfect structure" is likely to be
>tongue-in-cheek ...). 
>
>     When it comes to making explicit the advantages of archiving at least
>the raw images that yielded the data against which a deposited PDB entry was
>refined, many good reasons have been given, but I feel that 
>
>     (1) there is an over-emphasis on the preservation of diffuse scattering
>that has a tendency to give this archiving a nuance of "blue-skies" research
>and thus to detract from its practical urgency; time will come for diffuse
>scattering to be fully appreciated, but at the moment its mention acts as a
>bit of a distraction, if not a turn-off in this context for people who not
>not love it already;
>
>     (2) as far as I see it, the highest future benefit of having archived
>raw images will result from being able to reprocess datasets from samples
>containing multiple lattices ("non-merohedral twinning"). Numerous
>structures are determined and refined against data obtained by integrating
>only the spots from the major lattice, without rejecting those that are
>corrupted by overlap by a spot from a minor lattice. This leads to
>systematic errors in these data that may only be incompletely taken out by
>outlier rejection at the merging stage, and will create noise or confusing
>residual features in difference maps, if not false features in the main map
>and therefore its interpretation by the model. In my opinion it will be the
>development of methods for dealing with overlapped lattices and for the
>proper treatment of such data in scaling and refinement (as is already
>possible with small molecules) that will bring about the major possibility
>of substantially improving deposited results by reprocessing the raw images
>co-deposited with them;
>
>     (3) there is also the more immediate possibility of better removing ice
>rings, or ligand powder rings, from images, than by having to throw away
>certain thin shells of merged data in the structure factor file.
>
>     I see the case for raw image deposition as absolutely compelling,
>especially in view of the auto-catalytic process through which their
>availability will speed up the development of precisely the new methods and
>software to extract better data from them and better refine models against
>them. The impact of structure factor deposition on the development of better
>refinement programs is there to prove that this paradigm of a chain reaction
>makes total sense.
>
>     Various arguments tend to be fired off as decoys - "get better
>crystals", why not "get a better post-doc"? - but they are unhelpful in the
>way they prolong procrastination when what we need is to bite the bullet.
>The IUCr Forum that John Helliwell pointed at already contains draft plans
>for a pilot run of a reasonable scheme.
>
>
>     With best wishes,
>     
>          Gerard.
>
>--
>On Tue, Oct 18, 2011 at 06:19:27PM +0200, Enrico Stura wrote:
>> Dear Peter,
>>
>> How many crystallographers does it take to transform bad data into good 
>> data?
>> None, you need a modeller. Only a modeller can give you a structure with 
>> perfect
>> geometry. Data just introduces experimental errors into what would 
>> otherwise be a perfect
>> structure.
>>
>> If you have good data do you need crystallographers?
>> ...
>>
>> Of course there all the cases in between. That ... you are right, is the 
>> other half of the story.
>>
>> From a biological point of view, only borderline cases make "cents" ($+€) 
>> to store.
>> The experimenter in consultation with a beamline scientist at an SR 
>> facility is the best
>> small commitee suitable to evaluate what is worth keeping. I am sure that 
>> the images
>> that are worth storing for a long long time would fit on a few Tb at a 
>> reasonable cost.
>> Storing everything would make it harder to find something worth improving 
>> in the future.
>>
>> Enrico.
>>
>>
>> On Tue, 18 Oct 2011 17:12:42 +0200, Peter Keller 
>> <pkel...@globalphasing.com> wrote:
>>
>>> Dear Enrico,
>>>
>>> Please don't get me wrong: what you are saying is not incorrect, but it
>>> is only half the story.
>>>
>>> On Tue, 2011-10-18 at 15:13 +0200, Enrico Stura wrote:
>>>> With improving techniques, we should always be making progress!
>>>
>>> Yes, of course!
>>>
>>>> If we are trying to answer a biological question that is really 
>>>> important,
>>>> we would be better off
>>>> improving the purification, the crystallization, the cryo-conditions
>>>
>>> You have left X-ray crystallography out of this list. It is a technique
>>> like the others, and can also be improved :-)
>>>
>>> It may be true that the number of crystallographers that are working on
>>> improving instrumental methodology and software is small compared to the
>>> number working on improving wet-lab techniques, but that number is not
>>> zero, and the contribution is significant. The rest of you benefit from
>>> that work!
>>>
>>>> instead of having to rely on
>>>> processing old images with new software.
>>>>
>>>> I have 10 years  worth of images. I have reprocessed very few of them and
>>>> never made any
>>>> sensational progress using the new software. Poor diffraction is poor
>>>> diffraction.
>>>
>>> Maybe so, but certain types of datasets are useful for methods and
>>> software development, even if no new biological insights could be gained
>>> by reprocessing them. These datasets are often hard to get hold of in
>>> practice, especially when they are in someone's lab on a tape that
>>> no-one has a reader for any more.
>>>
>>> Obtaining protein, growing crystals and collecting new data in such a
>>> way that the interesting features of those datasets are reproduced can
>>> be much much harder than curating the images would be. This is
>>> especially true for software-oriented people like us who don't have
>>> regular access to wet-lab facilities.
>>>
>>>> Money can be better spent buying a wine cellar, storage works for wine.
>>>
>>> Images have already been lost that ought to have been kept. The
>>> questions are: how to select the datasets that are potentially of value,
>>> and how to make sure that they don't disappear.
>>>
>>> Regards,
>>> Peter.
>>>
>>
>>
>> -- 
>> Enrico A. Stura D.Phil. (Oxon) ,    Tel: 33 (0)1 69 08 4302 Office
>> Room 19, Bat.152,                   Tel: 33 (0)1 69 08 9449    Lab
>> LTMB, SIMOPRO, IBiTec-S, CE Saclay, 91191 Gif-sur-Yvette,   FRANCE
>> http://www-dsv.cea.fr/en/institutes/institute-of-biology-and-technology-saclay-ibitec-s/unites-de-recherche/department-of-molecular-engineering-of-proteins-simopro/molecular-toxinology-and-biotechnology-laboratory-ltmb/crystallogenesis-e.-stura
>> http://www.chem.gla.ac.uk/protein/mirror/stura/index2.html
>> e-mail: est...@cea.fr                             Fax: 33 (0)1 69 08 90 71
>
>-- 
>
>     ===============================================================
>     *                                                             *
>     * Gerard Bricogne                     g...@globalphasing.com  *
>     *                                                             *
>     * Global Phasing Ltd.                                         *
>     * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *
>     * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *
>     *                                                             *
>     ===============================================================

 

Reply via email to