By raw data I mean images. We think this is only manageable using a
distributed data grid model (eg Universities/institutions setup their
own repositories using open standards, and PDB aggregate the links to
them. URL persistence will be a hurdle I admit). You are right in
that a single-repository solution would be impractical. We would
hope that the PDB could store the unmerged intensities.
cheers
ashley
On 17/08/2007, at 12:13 AM, Phil Evans wrote:
What do you count as raw data? Rawest are the images - everything
beyond that is modellling - but archiving images is _expensive_!
Unmerged intensities are probably more manageable
Phil
On 16 Aug 2007, at 15:05, Ashley Buckle wrote:
Dear Randy
These are very valid points, and I'm so glad you've taken the
important step of initiating this. For now I'd like to respond to
one of them, as it concerns something I and colleagues in
Australia are doing:
The more information that is available, the easier it will be to
detect fabrication (because it is harder to make up more
information convincingly). For instance, if the diffraction data
are deposited, we can check for consistency with the known
properties of real macromolecular crystals, e.g. that they
contain disordered solvent and not vacuum. As Tassos Perrakis has
discovered, there are characteristic ways in which the standard
deviations depend on the intensities and the resolution. If
unmerged data are deposited, there will probably be evidence of
radiation damage, weak effects from intrinsic anomalous
scatterers, etc. Raw images are probably even harder to simulate
convincingly.
After the recent Science retractions we realised that its about
time raw data was made available. So, we have set about creating
the necessary IT and software to do this for our diffraction data,
and are encouraging Australian colleagues to do the same. We are
about a week away from launching a web-accessible repository for
our recently published (eg deposited in PDB) data, and this should
coincide with an upcoming publication describing a new structure
from our labs. The aim is that publication occurs simultaneously
with release in PDB as well as raw diffraction data on our
website. We hope to house as much of our data as possible, as well
as data from other Australian labs, but obviously the potential
dataset will be huge, so we are trying to develop, and make
available freely to the community, software tools that allow
others to easily setup their own repositories. After brief
discussion with PDB the plan is that PDB include links from
coordinates/SF's to the raw data using a simple handle that can be
incorporated into a URL. We would hope that we can convince the
journals that raw data must be made available at the time of
publication, in the same way as coordinates and structure
factors. Of course, we realise that there will be many hurdles
along the way but we are convinced that simply making the raw data
available ASAP is a 'good thing'.
We are happy to share more details of our IT plans with the
CCP4BB, such that they can be improved, and look forward to
hearing feedback
cheers
*NOTE* My new tel. no: (03) 9902 0269
Ashley Buckle Ph.D
NHMRC Senior Research Fellow
The Department of Biochemistry and Molecular Biology
School of Biomedical Sciences, Faculty of Medicine &
Victorian Bioinformatics Consortium (VBC)
Monash University, Clayton, Vic 3800
Australia
http://www.med.monash.edu.au/biochem/staff/abuckle.html
iChat/AIM: blindcaptaincat
skype: ashley.buckle
Tel: (613) 9902 0269 (office)
Tel: (613) 9905 1653 (lab)
Fax : (613) 9905 4699