Hi Clemens, yes I was not suggesting we keep the 'junk' but gave the brute figures more to give an idea for people to see how much data is collected. But you are right I should of qualified -for those 266,997 images I estimated 1163 data collections (where i classified a data collection as a set of images with 10 or more frames), giving an average number of images/data collection at BM14 for 2006 of 230 - amazing how well that agrees with your projected figure of 240! We knew you were SHARP but maybe you should now be known as C# ;-)))
As you rightly point out something in the range of 4-5 gbyte/structure is a good estimate. So again taking biosync statistics 25853 structures have been deposited with the pdb which claimed some sort of synchrotron was used in the process of structure solution so that gives an idea of current space required to date! M -----Original Message----- From: Clemens Vonrhein [mailto:[EMAIL PROTECTED] Sent: Friday, August 17, 2007 12:00 PM To: Martin A. Walsh Cc: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] The importance of USING our validation tools Hi Matrin, On Fri, Aug 17, 2007 at 11:09:28AM +0200, Martin Walsh wrote: > For 2006 at BM14 we and our users generated 266997 images/frames from our > MAR225 CCD (18mb files) or in other words ~4.8Tbyte (if you have patience to > do so then bzip2 will reduce these raw images to between 5.5 and 7Mb > -depending on how many diffraction spots /image) Looking at http://www.esrf.eu/exp_facilities/BM14/publications/publications-new.html it seems that 56 papers have been published in 2006 using BM14 data (directly). Lets say (for arguments sake) that each paper deposited 2 structures (and structure factors) into the PDB: this would mean about 2400 images/frames per structure (and about 40 Gb of data per structurte). There must be a large amount of junk in there not directly related to the deposited structure factors (images from screening or test crystals, basically useless crystals etc). I don't think anyone would want all images from every beamline deposited in a public database. I think if only the images related to the deposited structure factors are deposited, the data from BM14 would be at least a factor of 10 smaller (4Gb or 240 images per dataset). So this would mean 480 Gb of BM14 data for 2006 - or 54Tb for all 115 PX beamlines ... if they all would be as productive as BM14! Anyway, compared to astronomy and other fields it is fairly small (as Peter Keller mentioned in his post). If we think it is necessary (and I think we should) it will need to be done. It doesn't need to be perfect - but compared to e.g. the currently deposited structure factors, at least diffraction images have headers with useful information in them (even if the beam-centre, distance or wavelength etc are often wrong: but there are ways of getting at the correct values ... even if it is by trial and error). Cheers Clemens -- *************************************************************** * Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com * * Global Phasing Ltd. * Sheraton House, Castle Park * Cambridge CB3 0AX, UK *-------------------------------------------------------------- * BUSTER Development Group (http://www.globalphasing.com) ***************************************************************