Hi for data generation rates I can give you an idea of what is generated at a Bending magnet beamline at ESRF
For 2006 at BM14 we and our users generated 266997 images/frames from our MAR225 CCD (18mb files) or in other words ~4.8Tbyte (if you have patience to do so then bzip2 will reduce these raw images to between 5.5 and 7Mb -depending on how many diffraction spots /image) Taking this as a low level limit for data generation at beamlines around the world as of course you may collect many more frames (I am not discussing data sets but actual frames whether they be useful or not) at an ID line and use a bigger detector etc. Then you could do some silly calculation like this: The biosync (http://biosync.rcsb.org/) webpage currently has 115 PX listed beamlines so that would generate 0.5 petabyte. Multiply this by 10 from data deposition rates (again reported on Biosync webpages) is a very generous upper limit with current throughputs that one can assess from this crude metric gives ~5 petabyte of data/year/all px synchrotrons. (this is just for illustrative purposes so hopefully people won't get all shirty with me for making such an assumption) A final note that I think was touched on already regards availability of data from publicly funded research I am not sure of current situation worldwide and how that will in the long term apply to diffraction data collected on publicly funded beamlines but I belive all publicly funded research in UK is now obliged to make experimental data freely available/accessible to the general public -don't qoute me on that (alan ashton / bill pulford could qualify that point I hope!) as wilde said... "There are many things that we would throw away if we were not afraid that others might pick them up" so I'm sure most (all) people have raw data somewhere put away but whether they can still read it is another problem so it would be great to have data accessible from a webbased resource as ashley is doing! M -----Original Message----- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Clemens Vonrhein Sent: Thursday, August 16, 2007 4:47 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] The importance of USING our validation tools On Thu, Aug 16, 2007 at 03:13:29PM +0100, Phil Evans wrote: > What do you count as raw data? Rawest are the images - everything > beyond that is modellling - but archiving images is _expensive_! Hmmm - not sure: let's say that a typical dataset requires about 180 images with 10Mb each image. With the current amount of roughly 40000 X-ray structures in the PDB this is: 40000 * 180 * 10Mb = ~ 70 Tb of data With simple 1TB external disk at about GBP 200 we get a price of GBP 14000, i.e. 35 pence per dataset. Ok, this is not a proper calculation (more data collected, fine-phi slicing, MAD datasets etc etc) and lets apply a 'safety factor' of 10: but even then I think this is easily doable. As Tassos remarked as well: if we could store/deposit and manage PDB files in the 70s we should be able to do the same now (30 years later!) with images ... easily. Cheers Clemens > Unmerged intensities are probably more manageable > > Phil > > > On 16 Aug 2007, at 15:05, Ashley Buckle wrote: > > >Dear Randy > > > >These are very valid points, and I'm so glad you've taken the > >important step of initiating this. For now I'd like to respond to > >one of them, as it concerns something I and colleagues in Australia > >are doing: > >> > >>The more information that is available, the easier it will be to > >>detect fabrication (because it is harder to make up more > >>information convincingly). For instance, if the diffraction data > >>are deposited, we can check for consistency with the known > >>properties of real macromolecular crystals, e.g. that they contain > >>disordered solvent and not vacuum. As Tassos Perrakis has > >>discovered, there are characteristic ways in which the standard > >>deviations depend on the intensities and the resolution. If > >>unmerged data are deposited, there will probably be evidence of > >>radiation damage, weak effects from intrinsic anomalous > >>scatterers, etc. Raw images are probably even harder to simulate > >>convincingly. > > > >After the recent Science retractions we realised that its about > >time raw data was made available. So, we have set about creating > >the necessary IT and software to do this for our diffraction data, > >and are encouraging Australian colleagues to do the same. We are > >about a week away from launching a web-accessible repository for > >our recently published (eg deposited in PDB) data, and this should > >coincide with an upcoming publication describing a new structure > >from our labs. The aim is that publication occurs simultaneously > >with release in PDB as well as raw diffraction data on our website. > >We hope to house as much of our data as possible, as well as data > >from other Australian labs, but obviously the potential dataset > >will be huge, so we are trying to develop, and make available > >freely to the community, software tools that allow others to easily > >setup their own repositories. After brief discussion with PDB the > >plan is that PDB include links from coordinates/SF's to the raw > >data using a simple handle that can be incorporated into a URL. We > >would hope that we can convince the journals that raw data must be > >made available at the time of publication, in the same way as > >coordinates and structure factors. Of course, we realise that > >there will be many hurdles along the way but we are convinced that > >simply making the raw data available ASAP is a 'good thing'. > > > >We are happy to share more details of our IT plans with the CCP4BB, > >such that they can be improved, and look forward to hearing feedback > > > >cheers > -- *************************************************************** * Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com * * Global Phasing Ltd. * Sheraton House, Castle Park * Cambridge CB3 0AX, UK *-------------------------------------------------------------- * BUSTER Development Group (http://www.globalphasing.com) ***************************************************************