Hi Clemens, yes I was not suggesting we keep the 'junk' but gave the brute
figures more to give an idea for people to see how much data is collected.
But you are right  I should of qualified -for those 266,997 images I
estimated 1163 data collections (where i classified a data collection as a
set of images with 10 or more frames), giving an average number of
images/data collection at BM14 for 2006 of 230 - amazing how well that
agrees with your projected figure of 240! We knew you were SHARP but maybe
you should now be known as C# ;-)))

As you rightly point out something in the range of 4-5 gbyte/structure is a
good estimate. 
So again taking biosync statistics 25853 structures have been deposited with
the pdb which claimed some sort of synchrotron was used in the process of
structure solution so that gives an idea of current space required to date!

M

-----Original Message-----
From: Clemens Vonrhein [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 17, 2007 12:00 PM
To: Martin A. Walsh
Cc: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools

Hi Matrin,

On Fri, Aug 17, 2007 at 11:09:28AM +0200, Martin Walsh wrote:
> For 2006 at BM14 we and our users generated 266997 images/frames from our
> MAR225 CCD (18mb files) or in other words ~4.8Tbyte (if you have patience
to
> do so then bzip2 will reduce these raw images to between 5.5 and 7Mb
> -depending on how many diffraction spots /image)

Looking at

  http://www.esrf.eu/exp_facilities/BM14/publications/publications-new.html

it seems that 56 papers have been published in 2006 using BM14 data
(directly). Lets say (for arguments sake) that each paper deposited 2
structures (and structure factors) into the PDB: this would mean about
2400 images/frames per structure (and about 40 Gb of data per
structurte). There must be a large amount of junk in there not
directly related to the deposited structure factors (images from
screening or test crystals, basically useless crystals etc).

I don't think anyone would want all images from every beamline
deposited in a public database. I think if only the images related to
the deposited structure factors are deposited, the data from BM14
would be at least a factor of 10 smaller (4Gb or 240 images per
dataset). So this would mean 480 Gb of BM14 data for 2006 - or 54Tb
for all 115 PX beamlines ... if they all would be as productive as
BM14! Anyway, compared to astronomy and other fields it is fairly
small (as Peter Keller mentioned in his post).

If we think it is necessary (and I think we should) it will need to be
done. It doesn't need to be perfect - but compared to e.g. the
currently deposited structure factors, at least diffraction images
have headers with useful information in them (even if the beam-centre,
distance or wavelength etc are often wrong: but there are ways of
getting at the correct values ... even if it is by trial and
error).

Cheers

Clemens

-- 

***************************************************************
* Clemens Vonrhein, Ph.D.     vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--------------------------------------------------------------
* BUSTER Development Group      (http://www.globalphasing.com)
***************************************************************

Reply via email to