For nice crystals data processing is straightforward. For crystals with
large unit cells, high mosaicity, and diffuse scattering, processing
can be critical. It may be that future advances in integration
software will allow one to extract far better data from such a
diffraction dataset than can be obtained now. Even short of that,
systematic optimization of things like assumed mosaicity, integration
box parameters, and which parameters to fix or refine for the crystal
or refine for each frame can make a big difference. With high
profile structures, the rush to publish as soon as possible does
not often permit this kind of refinement.

Obviously beamline personel should strive to get the correct
values in the header, but if they are even close enough to
allow indexing, beam center and distance can be refined
together with crystal parameters.

And if we consider fabrication, obviously it would be trivial
to take Fcalcs, add random noise and call them Fobs, whereas
generating a convincing diffraction pattern, that will give
the stated Fobs when integrated, would take a lot more work.
(I think James Holten has such a program, but presumably
it leaves obvious tracks that would allow one to detect
its misuse?)

So I think it is each PI's responsibility to keep the images
from any structure that was published, maybe not publicly
accessible on a web site but at least on DVD's in a drawer
somewhere, so it could be pulled out for reprocessing if an
improved algorithm appears, or for presentation to a funding
agency if a question of scientific misconduct is being investigated.
Saving the crystal in liquid N2 might be good also, if it is not
already burned up by radiation damage.

Ed

Santarsiero, Bernard D. wrote:
Sorry, I think it's a waste of resources to store the raw images. I think
we should trust people to be able to at least process their own data set.
Besides, you would need to include beamline parameters, beam position,
detector distances, etc. that may or may not be correct in the image
headers. I'm all for storage and retrieval of a primary intensity data
file (I or F^2 with esds).

Bernie Santarsiero


On Thu, August 16, 2007 9:46 am, Mischa Machius wrote:
Hmm - I think I miscalculated, by a factor of 100 even!... need more
coffee. In any case, I still think it would be doable. Best - MM


On Aug 16, 2007, at 9:30 AM, Mischa Machius wrote:

I don't think archiving images would be that expensive. For one, I
have found that most formats can be compressed quite substantially
using simple, standard procedures like bzip2. If optimized, raw
images won't take up that much space. Also, initially, only those
images that have been used to obtain phases and to refine finally
deposited structures could be archived. If the average structure
takes up 20GB of space, 5,000 structures would be 1TB, which fits
on a single hard drive for less than $400. If the community thinks
this is a worthwhile endeavor, money should be available from
granting agencies to establish a central repository (e.g., at the
RCSB). Imagine what could be done with as little as $50,000. For
large detectors, binning could be used, but giving current hard
drive prices and future developments, that won't be necessary. Best
- MM


On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:

What do you count as raw data? Rawest are the images - everything
beyond that is modellling - but archiving images is _expensive_!
Unmerged intensities are probably more manageable

Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:

Dear Randy

These are very valid points, and I'm so glad you've taken the
important step of initiating this. For now I'd like to respond to
one of them, as it concerns something I and colleagues in
Australia are doing:
The more information that is available, the easier it will be to
detect fabrication (because it is harder to make up more
information convincingly). For instance, if the diffraction data
are deposited, we can check for consistency with the known
properties of real macromolecular crystals, e.g. that they
contain disordered solvent and not vacuum. As Tassos Perrakis
has discovered, there are characteristic ways in which the
standard deviations depend on the intensities and the
resolution. If unmerged data are deposited, there will probably
be evidence of radiation damage, weak effects from intrinsic
anomalous scatterers, etc. Raw images are probably even harder
to simulate convincingly.
After the recent Science retractions we realised that its about
time raw data was made available. So, we have set about creating
the necessary IT and software to do this for our diffraction
data, and are encouraging Australian colleagues to do the same.
We are about a week away from launching a web-accessible
repository for our recently published (eg deposited in PDB) data,
and this should coincide with an upcoming publication describing
a new structure from our labs. The aim is that publication occurs
simultaneously with release in PDB as well as raw diffraction
data on our website. We hope to house as much of our data as
possible, as well as data from other Australian labs, but
obviously the potential dataset will be huge, so we are trying to
develop, and make available freely to the community, software
tools that allow others to easily setup their own repositories.
After brief discussion with PDB the plan is that PDB include
links from coordinates/SF's to the raw data using a simple handle
that can be incorporated into a URL.  We would hope that we can
convince the journals that raw data must be made available at the
time of publication, in the same way as coordinates and structure
factors.  Of course, we realise that there will be many hurdles
along the way but we are convinced that simply making the raw
data available ASAP is a 'good thing'.

We are happy to share more details of our IT plans with the
CCP4BB, such that they can be improved, and look forward to
hearing feedback

cheers

----------------------------------------------------------------------
----------
Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353

------------------------------------------------------------------------
--------
Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353

Reply via email to