On Thu, Aug 16, 2007 at 03:13:29PM +0100, Phil Evans wrote: > What do you count as raw data? Rawest are the images - everything > beyond that is modellling - but archiving images is _expensive_!
Hmmm - not sure: let's say that a typical dataset requires about 180 images with 10Mb each image. With the current amount of roughly 40000 X-ray structures in the PDB this is: 40000 * 180 * 10Mb = ~ 70 Tb of data With simple 1TB external disk at about GBP 200 we get a price of GBP 14000, i.e. 35 pence per dataset. Ok, this is not a proper calculation (more data collected, fine-phi slicing, MAD datasets etc etc) and lets apply a 'safety factor' of 10: but even then I think this is easily doable. As Tassos remarked as well: if we could store/deposit and manage PDB files in the 70s we should be able to do the same now (30 years later!) with images ... easily. Cheers Clemens > Unmerged intensities are probably more manageable > > Phil > > > On 16 Aug 2007, at 15:05, Ashley Buckle wrote: > > >Dear Randy > > > >These are very valid points, and I'm so glad you've taken the > >important step of initiating this. For now I'd like to respond to > >one of them, as it concerns something I and colleagues in Australia > >are doing: > >> > >>The more information that is available, the easier it will be to > >>detect fabrication (because it is harder to make up more > >>information convincingly). For instance, if the diffraction data > >>are deposited, we can check for consistency with the known > >>properties of real macromolecular crystals, e.g. that they contain > >>disordered solvent and not vacuum. As Tassos Perrakis has > >>discovered, there are characteristic ways in which the standard > >>deviations depend on the intensities and the resolution. If > >>unmerged data are deposited, there will probably be evidence of > >>radiation damage, weak effects from intrinsic anomalous > >>scatterers, etc. Raw images are probably even harder to simulate > >>convincingly. > > > >After the recent Science retractions we realised that its about > >time raw data was made available. So, we have set about creating > >the necessary IT and software to do this for our diffraction data, > >and are encouraging Australian colleagues to do the same. We are > >about a week away from launching a web-accessible repository for > >our recently published (eg deposited in PDB) data, and this should > >coincide with an upcoming publication describing a new structure > >from our labs. The aim is that publication occurs simultaneously > >with release in PDB as well as raw diffraction data on our website. > >We hope to house as much of our data as possible, as well as data > >from other Australian labs, but obviously the potential dataset > >will be huge, so we are trying to develop, and make available > >freely to the community, software tools that allow others to easily > >setup their own repositories. After brief discussion with PDB the > >plan is that PDB include links from coordinates/SF's to the raw > >data using a simple handle that can be incorporated into a URL. We > >would hope that we can convince the journals that raw data must be > >made available at the time of publication, in the same way as > >coordinates and structure factors. Of course, we realise that > >there will be many hurdles along the way but we are convinced that > >simply making the raw data available ASAP is a 'good thing'. > > > >We are happy to share more details of our IT plans with the CCP4BB, > >such that they can be improved, and look forward to hearing feedback > > > >cheers > -- *************************************************************** * Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com * * Global Phasing Ltd. * Sheraton House, Castle Park * Cambridge CB3 0AX, UK *-------------------------------------------------------------- * BUSTER Development Group (http://www.globalphasing.com) ***************************************************************