Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Shekhar C. Mande
While the topic of "fabrication" is still hot, I thought I too could add 
a few thoughts. 

Our Mathematician friends always make fun of us (Biologists/ Biochemists/ 
crystallographers!) that our papers are accepted within 4-8 weeks of 
submission. 
This is not to talk of Science/ Nature/ Cell, where even more rapid reviews 
are the norms. In the Mathematics world it is customary to have one year 
review of manuscripts, and prior announcements of the work on respective 
web sites. The one year review, and the prior announcements on web sites, 
allows others to review the results independently. That perhaps brings 
in the required rigor in the results. Consequently, there are not as many 
retractions in Mathematics as what we see in our area. It is perhaps not 
possible in our (crystallographic) World to check every strcture independently 
by others. Yet, longer review along with access to raw data might allow 
reviewers to check the finer details of the structures. I would strongly 
suggest that raw data be made available to reviewers, and that reviewers 
should check the structures before the papers are accepted. Any error in 
the final published structures, blame should also lie partially on the 
reviewer. The back-to-back controversies are bound to hurt crystallogrophic 
community as a whole, and IUCr should ponder over to better checks for 
the future. 

Shekhar Mande
Hyderabad, INDIA
-REPLY TO-
Date:Thu Aug 16 21:22:20 GMT+08:00 2007
FROM: Randy J. Read  <[EMAIL PROTECTED]>
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools
On Aug 16 2007, Eleanor Dodson wrote:

>The weighting in REFMAC is a function of SigmA ( plotted in log file). 
>For this example it will be nearly 1 for all resolutions ranges so the 
>weights are pretty constant. There is also a contribution from the 
>"experimental" sigma, which in this case seems to be proportional to |F| 

Originally I expected that the publication of our Brief Communication in 
Nature would stimulate a lot of discussion on the bulletin board, but 
clearly it hasn't. One reason is probably that we couldn't be as forthright 
as we wished to be. For its own good reasons, Nature did not allow us to 
use the word "fabricated". Nor were we allowed to discuss other structures 
from the same group, if they weren't published in Nature.

Another reason is an understandable reluctance to make allegations in 
public, and the CCP4 bulletin board probably isn't the best place to do 
that.

But I think the case raises essential topics for the community to discuss, 
and this is a good forum for those discussions. We need to consider how 
to 
ensure the integrity of the structural databases and the associated 
publications.

So here are some questions to start a discussion, with some suggestions 
of 
partial answers.

1. How many structures in the PDB are fabricated?

I don't know, but I think (or at least hope) that the number is very small. 

2. How easy is it to fabricate a structure?

It's very easy, if no-one will be examining it with a suspicious mind, 
but 
it's extremely difficult to do well. No matter how well a structure is 
fabricated, it will violate something that is known now or learned later 
about the properties of real macromolecules and their diffraction data. 
If 
you're clever enough to do this really well, then you should be clever 
enough to determine the real structure of an interesting protein.

3. How can we tell whether structures in the PDB are fabricated, or just 
poorly refined?

The current standard validation tools are aimed at detecting errors in 
structure determination or the effects of poor refinement practice. None 
of 
them are aimed at detecting specific signs of fabrication because we assume 
(almost always correctly) that others are acting in good faith.

The more information that is available, the easier it will be to detect 
fabrication (because it is harder to make up more information 
convincingly). For instance, if the diffraction data are deposited, we 
can 
check for consistency with the known properties of real macromolecular 
crystals, e.g. that they contain disordered solvent and not vacuum. As 
Tassos Perrakis has discovered, there are characteristic ways in which 
the 
standard deviations depend on the intensities and the resolution. If 
unmerged data are deposited, there will probably be evidence of radiation 
damage, weak effects from intrinsic anomalous scatterers, etc. Raw images 
are probably even harder to simulate convincingly.

If a structure is fabricated by making up a new crystal form, perhaps a 
complex of previously-known components, then the crystal packing 
interactions should look like the interactions seen in real crystals. If 
it's fabricated by homology modelling, then the internal packing is likely 
to be suboptimal. I'm told by David Baker (who knows a thing or two about 
this) that it is extremely difficult to make a homology model that both 

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Winter, G (Graeme)
Storing all the images *is* expensive but it can be done - the JCSG do
this and make available a good chunk of their raw diffraction data. The
cost is, however, in preparing this to make the data useful for the
person who downloads it.

If we are going to store and publish the raw experimental measurements
(e.g. the images) which I think would be spectacular, we will also need
to define a minimum amount of metadata which should be supplied with
this to allow a reasonable chance of reproduction of the results. This
is clearly not trivial, but there is probably enough information in the
harvest and log files from e.g. CCP4, HKL2000, Phenix to allow this.

The real problem will be in getting people to dig out that tape / dvd
with the images on, prepare the required metadata and "deposit" this
information somewhere. Actually storing it is a smaller challenge,
though this is a long way from being trivial.

On an aside - firewire disks are indeed a very cheap way of storing the
data. There is a good reason why they are much cheaper than the
equivalent RAID array. They fail. Ever lost 500GB of data in one go?
Ouch. ;o)

Just MHO.

Cheers,

Graeme 

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Phil Evans
Sent: 16 August 2007 15:13
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools

What do you count as raw data? Rawest are the images - everything beyond
that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable

Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:

> Dear Randy
>
> These are very valid points, and I'm so glad you've taken the 
> important step of initiating this. For now I'd like to respond to one 
> of them, as it concerns something I and colleagues in Australia are 
> doing:
>>
>> The more information that is available, the easier it will be to 
>> detect fabrication (because it is harder to make up more information 
>> convincingly). For instance, if the diffraction data are deposited, 
>> we can check for consistency with the known properties of real 
>> macromolecular crystals, e.g. that they contain disordered solvent 
>> and not vacuum. As Tassos Perrakis has discovered, there are 
>> characteristic ways in which the standard deviations depend on the 
>> intensities and the resolution. If unmerged data are deposited, there

>> will probably be evidence of radiation damage, weak effects from 
>> intrinsic anomalous scatterers, etc. Raw images are probably even 
>> harder to simulate convincingly.
>
> After the recent Science retractions we realised that its about time 
> raw data was made available. So, we have set about creating the 
> necessary IT and software to do this for our diffraction data, and are

> encouraging Australian colleagues to do the same. We are about a week 
> away from launching a web-accessible repository for our recently 
> published (eg deposited in PDB) data, and this should coincide with an

> upcoming publication describing a new structure from our labs. The aim

> is that publication occurs simultaneously with release in PDB as well 
> as raw diffraction data on our website.
> We hope to house as much of our data as possible, as well as data from

> other Australian labs, but obviously the potential dataset will be 
> huge, so we are trying to develop, and make available freely to the 
> community, software tools that allow others to easily setup their own 
> repositories.  After brief discussion with PDB the plan is that PDB 
> include links from coordinates/SF's to the raw data using a simple 
> handle that can be incorporated into a URL.  We would hope that we can

> convince the journals that raw data must be made available at the time

> of publication, in the same way as coordinates and structure factors.

> Of course, we realise that there will be many hurdles along the way 
> but we are convinced that simply making the raw data available ASAP is

> a 'good thing'.
>
> We are happy to share more details of our IT plans with the CCP4BB, 
> such that they can be improved, and look forward to hearing feedback
>
> cheers


Re: [ccp4bb] possibility of other fabricated structures

2007-08-17 Thread Tim Fenn
On Thu, Aug 16, 2007 at 11:31:00PM -0400, Petr Leiman wrote:
> A small, but very important excerpt from the original Randy Read's message
> 
> "... Nature did not allow us to use the word "fabricated". Nor were we 
> allowed to discuss other structures from the same group, if they weren't 
> published in Nature."
> 
> So, are there OTHER SUSPECT STRUCTURES from the same group or same authors 
> published elsewhere???
> 

Yes.  I expect a similar letter, albeit to a different journal, soon.

Regards,
Tim

-- 
-

Tim Fenn
[EMAIL PROTECTED]
Stanford University, School of Medicine
James H. Clark Center
318 Campus Drive, Room E300
Stanford, CA  94305-5432
Phone:  (650) 736-1714
FAX:  (650) 736-1961

-


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Manuel Than

Dear colleagues,

	the recent discussion on the necessity and feasibility of storing raw 
data for all our structures raises a second point, I think. For the 
current discussion it is only a matter of storage place that has to be 
assigned somehow to make fobs, unmerged data, or raw images available to 
everybody who want's to download, but there are other science fields out 
there as well. Do we want to collect also gels, plots, plasmids, 
bacterial strains, mice, dollies,  at some central place? Or should 
rather the scientific ethics bind all of us to practice good science and 
to be an objective reviewer when asked?


	The usefulness for software developers and future experiments with our 
data is a completely different issue of course.


Just wanting to raise this point.

Manuel Than

--
**
Dr. Manuel E. Than

Protein Crystallography Group
Leibniz Institute for Age Research -
Fritz Lipmann Institute (FLI)
Beutenbergstraße 11
D-07745 Jena
Germany

Tel.: ++49 3641 65 6170
Fax.: ++49 3641 65 6335

e-mail: [EMAIL PROTECTED]
http://www.fli-leibniz.de/groups/than.php


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Martin A. Walsh
Hi 
for data generation rates I can give you an idea of what is generated at a
Bending magnet beamline at ESRF

For 2006 at BM14 we and our users generated 266997 images/frames from our
MAR225 CCD (18mb files) or in other words ~4.8Tbyte (if you have patience to
do so then bzip2 will reduce these raw images to between 5.5 and 7Mb
-depending on how many diffraction spots /image)

Taking this as a low level limit for data generation at beamlines around the
world as of course you may collect many more frames (I am not discussing
data sets but actual frames whether they be useful or not) at an ID line and
use a bigger detector etc. Then you could do some silly calculation like
this:

The biosync (http://biosync.rcsb.org/) webpage currently has 115 PX listed
beamlines so that would generate 0.5 petabyte. 

Multiply this by 10 from data deposition rates  (again reported on Biosync
webpages) is a very generous upper limit with current throughputs that one
can assess from this crude metric gives ~5 petabyte of data/year/all px
synchrotrons. (this is just for illustrative purposes so hopefully people
won't get all shirty with me for making such an assumption)
  

A final note that I think was touched on already regards availability of
data from publicly funded research
I am not sure of current situation worldwide and how that will in the long
term apply to diffraction data collected on publicly funded beamlines but I
belive all publicly funded research in UK is now obliged to make
experimental data freely available/accessible to the general public -don't
qoute me on that (alan ashton / bill pulford could qualify that point I
hope!)

as wilde said...
"There are many things that we would throw away if we were not afraid that
others might pick them up"  
so I'm sure most (all) people have raw data somewhere put away but whether
they can still read it is another problem so it would be great to have data
accessible from a webbased resource as ashley is doing!

M


 -Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Clemens Vonrhein
Sent: Thursday, August 16, 2007 4:47 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools

On Thu, Aug 16, 2007 at 03:13:29PM +0100, Phil Evans wrote:
> What do you count as raw data? Rawest are the images - everything  
> beyond that is modellling - but archiving images is _expensive_!  

Hmmm - not sure: let's say that a typical dataset requires about 180
images with 10Mb each image. With the current amount of roughly 4
X-ray structures in the PDB this is:

  4 * 180 * 10Mb = ~ 70 Tb of data

With simple 1TB external disk at about GBP 200 we get a price of GBP
14000, i.e. 35 pence per dataset.

Ok, this is not a proper calculation (more data collected, fine-phi
slicing, MAD datasets etc etc) and lets apply a 'safety factor' of 10:
but even then I think this is easily doable.

As Tassos remarked as well: if we could store/deposit and manage PDB
files in the 70s we should be able to do the same now (30 years
later!) with images ... easily.

Cheers

Clemens

> Unmerged intensities are probably more manageable
> 
> Phil
> 
> 
> On  16 Aug 2007, at 15:05, Ashley Buckle wrote:
> 
> >Dear Randy
> >
> >These are very valid points, and I'm so glad you've taken the  
> >important step of initiating this. For now I'd like to respond to  
> >one of them, as it concerns something I and colleagues in Australia  
> >are doing:
> >>
> >>The more information that is available, the easier it will be to  
> >>detect fabrication (because it is harder to make up more  
> >>information convincingly). For instance, if the diffraction data  
> >>are deposited, we can check for consistency with the known  
> >>properties of real macromolecular crystals, e.g. that they contain  
> >>disordered solvent and not vacuum. As Tassos Perrakis has  
> >>discovered, there are characteristic ways in which the standard  
> >>deviations depend on the intensities and the resolution. If  
> >>unmerged data are deposited, there will probably be evidence of  
> >>radiation damage, weak effects from intrinsic anomalous  
> >>scatterers, etc. Raw images are probably even harder to simulate  
> >>convincingly.
> >
> >After the recent Science retractions we realised that its about  
> >time raw data was made available. So, we have set about creating  
> >the necessary IT and software to do this for our diffraction data,  
> >and are encouraging Australian colleagues to do the same. We are  
> >about a week away from launching a web-accessible repository for  
> >our recently published (eg deposited in PDB) data, and this should  
> >coincide with an upcoming publication describing a new structure  
> >from our labs. The aim is that publication occurs simultaneously  
> >with release in PDB as well as raw diffraction data on our website.  
> >We hope to house as much of our data as possible, as well as data  
> >from other Australian labs, but ob

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Clemens Vonrhein
Hi Matrin,

On Fri, Aug 17, 2007 at 11:09:28AM +0200, Martin Walsh wrote:
> For 2006 at BM14 we and our users generated 266997 images/frames from our
> MAR225 CCD (18mb files) or in other words ~4.8Tbyte (if you have patience to
> do so then bzip2 will reduce these raw images to between 5.5 and 7Mb
> -depending on how many diffraction spots /image)

Looking at

  http://www.esrf.eu/exp_facilities/BM14/publications/publications-new.html

it seems that 56 papers have been published in 2006 using BM14 data
(directly). Lets say (for arguments sake) that each paper deposited 2
structures (and structure factors) into the PDB: this would mean about
2400 images/frames per structure (and about 40 Gb of data per
structurte). There must be a large amount of junk in there not
directly related to the deposited structure factors (images from
screening or test crystals, basically useless crystals etc).

I don't think anyone would want all images from every beamline
deposited in a public database. I think if only the images related to
the deposited structure factors are deposited, the data from BM14
would be at least a factor of 10 smaller (4Gb or 240 images per
dataset). So this would mean 480 Gb of BM14 data for 2006 - or 54Tb
for all 115 PX beamlines ... if they all would be as productive as
BM14! Anyway, compared to astronomy and other fields it is fairly
small (as Peter Keller mentioned in his post).

If we think it is necessary (and I think we should) it will need to be
done. It doesn't need to be perfect - but compared to e.g. the
currently deposited structure factors, at least diffraction images
have headers with useful information in them (even if the beam-centre,
distance or wavelength etc are often wrong: but there are ways of
getting at the correct values ... even if it is by trial and
error).

Cheers

Clemens

-- 

***
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--
* BUSTER Development Group  (http://www.globalphasing.com)
***


[ccp4bb] The importance of USING our validation tools

2007-08-17 Thread vanraaij

how about this scenario:
the structure really is as published, or very like it, but upon  
refinement, R-factors were high, other indicators were dodgy etc. so  
the authors were afraid to publish as is and made up a dataset to  
support their structure - this would be a bit less bad.


let's see how the authors, Nature (and the other journals) respond.
As some remarked, this is important, because if we let it pass, it  
could damage the prestige of protein crystallography. I think, as the  
crystallographic community, we should quickly act to put proper  
controls in place like submission of raw images etc.

Mark


[ccp4bb] Public forums

2007-08-17 Thread Hurley, Thomas D.
I'm with Bill on this one. 
 
Despite the overwhelming evidence, one must still keep in mind that the 
accused, including all authors on the manscript for this structure, have yet to 
be 'convicted' in any legal sense.  I hope the investigation is swift, thorough 
and leads to clear conclusions and satisfactory actions.
 
It is beyond me why/how the situation would have developed, how the review 
process failed and how the journal failed in its editorial responsibility, but 
it did.  
 
Consequently, there are many ethical and scientific credibility issues to be 
sorted. Public excoriation of all those involved, while seemingly justified 
because our sensibilities have been assaulted, begins to approach mob justice.
 
The discussion threads have led to some good ideas concerning raw data 
archiving, let's keep pursuing these issues within the community and improve 
the current state of the art.
 
Tom Hurley
Indiana University School of Medicine



From: CCP4 bulletin board on behalf of William Scott
Sent: Thu 8/16/2007 8:13 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools



 No one knows definitively if this was fabricated.

Well, at least one person does. 

But I agree, it is important to keep in mind that the proper venue for
determining guilt or innocence in the case of fraud is the court system.

Until fairly recently, the idea of presumed innocence and the right to
cross-examine accusers and witnesses has been considered fundamental to
civil society.

The case certainly sounds compelling, but this is all the more reason to
adhere to these ideals.

Bill Scott


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Martin A. Walsh
Hi Clemens, yes I was not suggesting we keep the 'junk' but gave the brute
figures more to give an idea for people to see how much data is collected.
But you are right  I should of qualified -for those 266,997 images I
estimated 1163 data collections (where i classified a data collection as a
set of images with 10 or more frames), giving an average number of
images/data collection at BM14 for 2006 of 230 - amazing how well that
agrees with your projected figure of 240! We knew you were SHARP but maybe
you should now be known as C# ;-)))

As you rightly point out something in the range of 4-5 gbyte/structure is a
good estimate. 
So again taking biosync statistics 25853 structures have been deposited with
the pdb which claimed some sort of synchrotron was used in the process of
structure solution so that gives an idea of current space required to date!

M

-Original Message-
From: Clemens Vonrhein [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 17, 2007 12:00 PM
To: Martin A. Walsh
Cc: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools

Hi Matrin,

On Fri, Aug 17, 2007 at 11:09:28AM +0200, Martin Walsh wrote:
> For 2006 at BM14 we and our users generated 266997 images/frames from our
> MAR225 CCD (18mb files) or in other words ~4.8Tbyte (if you have patience
to
> do so then bzip2 will reduce these raw images to between 5.5 and 7Mb
> -depending on how many diffraction spots /image)

Looking at

  http://www.esrf.eu/exp_facilities/BM14/publications/publications-new.html

it seems that 56 papers have been published in 2006 using BM14 data
(directly). Lets say (for arguments sake) that each paper deposited 2
structures (and structure factors) into the PDB: this would mean about
2400 images/frames per structure (and about 40 Gb of data per
structurte). There must be a large amount of junk in there not
directly related to the deposited structure factors (images from
screening or test crystals, basically useless crystals etc).

I don't think anyone would want all images from every beamline
deposited in a public database. I think if only the images related to
the deposited structure factors are deposited, the data from BM14
would be at least a factor of 10 smaller (4Gb or 240 images per
dataset). So this would mean 480 Gb of BM14 data for 2006 - or 54Tb
for all 115 PX beamlines ... if they all would be as productive as
BM14! Anyway, compared to astronomy and other fields it is fairly
small (as Peter Keller mentioned in his post).

If we think it is necessary (and I think we should) it will need to be
done. It doesn't need to be perfect - but compared to e.g. the
currently deposited structure factors, at least diffraction images
have headers with useful information in them (even if the beam-centre,
distance or wavelength etc are often wrong: but there are ways of
getting at the correct values ... even if it is by trial and
error).

Cheers

Clemens

-- 

***
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--
* BUSTER Development Group  (http://www.globalphasing.com)
***


[ccp4bb] C3b, validation, deposition diffraction images

2007-08-17 Thread Luca Jovine

Dear fellow crystallographers,

I'd like to add yet another piece to the ongoing discussion about C3b  
& Co., and the importance of submitting diffraction images in  
addition to structure factors.


Partly in response to some letters that had just appeared in the same  
journal, on 16 July together with some colleagues I sent a one-page  
comment to the editor of Science exactly on this matter. You can read  
it here:


http://www.biosci.ki.se/groups/ljo/tmp/diffraction_images.pdf

As you will see, what we wrote very much mirrors the general opinions  
that have been expressed on ccp4bb during the last couple of days.  
However, Science declined to publish our letter; on 2 August we  
therefore sent it to Nature Structural and Molecular Biology, which  
after a few days also deemed it "unsuitable" for the journal. Uhm!


I will not elaborate as to why such an issue should not concern these  
journals, considering how many crystallographic structures have been  
debated recently. But I felt compelled to bring this up again in this  
occasion, since it is quite clear that we (almost) all agree on the  
fact that availability of raw images would improve things. And  
indeed, this was the take-home message of the communication in Nature  
by Janssen/Read/Brünger/Gros. So I have a simple suggestion: if we  
are convinced that this is an idea worth pursuing, why don't we come  
up with a common statement concerning this issue, undersigned by all  
that agree on the need of making available raw diffraction images in  
addition to structure factors? This could be sent to RCSB, to editors  
of scientific journals publishing structure papers, as well as -  
perhaps - to an open-access journal, so that it would also be  
formally published and made available for free to the whole  
community. Hopefully, the latter might help bring the issue to the  
attention of funding bodies (after all, someone would eventually have  
to pay for the required storage media).


One last (but important) point: as it should hopefully be obvious  
from the tone of our original letter, our main reason for suggestion  
image submission was not at all to start checking everyone else's  
data in search for faults! In fact, it was exactly the opposite: we  
wanted to help saving information, not destroying it. Should we  
decide to go ahead and make our voice heard as a community, it would  
be nice if we kept this disposition...


Awaiting comments!

With best regards,

Luca

PS: Since we are talking about fabricated - oops, suspect -  
structures, I thought I should provide some evidence that our  
submissions actually took place:


http://www.biosci.ki.se/groups/ljo/tmp/science.pdf
http://www.biosci.ki.se/groups/ljo/tmp/nsmb.pdf

See, they actually happened :-)


PS2: My colleagues E. Morgunova and R. Ladenstein are currently away,  
so I am the only person to be blamed for the above considerations.



Luca Jovine, Ph.D.
Karolinska Institutet
Department of Biosciences and Nutrition
Hälsovägen 7, S-141 57 Huddinge, Sweden
Voice: +46.(0)8.6083-301  FAX: +46.(0)8.6089-290
E-mail: [EMAIL PROTECTED]
W3: http://www.ki.se/



[ccp4bb] LivePDB (related to: The importance of ... )

2007-08-17 Thread Gerard Bricogne
Dear all,

 It has been quite fascinating to see this thread develop in the past
couple of days, as I was a hair's breadth away from initiating a similar
thread upon returning from the ACA meeting at the end of July.

 An impromptu working dinner was held on the Tuesday evening of the
meeting to discuss various aspects of The Future of the PDB. Most of the
topics that were touched upon were technical, bordering (hardly) on the
clerical. I took advantage of a brief window of opportunity that opened
around the topic of "What should be the PDB's mission" to make a plea for 
precisely the shift of emphasis that has been advocated collectively under
the "Importance of ..." thread: 

 (1) that people should be asked to deposit, and the PDB should archive,
raw images as well as all the information enabling the whole structure
determination and refinement process giving rise to a publication to be
reproduced by any interested third party; this would address the questions
of the reproducibility of results in a fairly radical (and beneficial)
manner; 

 (2) that the existence of such an archive would be enormously
beneficial to the software developers' community, as new developments could
be benchmarked against what was the "state of the art" at the time each 
structure was solved, without the huge effort this involves at the moment;

 (3) that the improvements in methods that such a working practice would
facilitate would themselves contribute to making it possible, in time, to
produce even better results from those annotated raw data than those
originally deposited; in this way, even the contents of the PDB would be
alive and constantly evolving, rather than frozen in their original state;

 I was "surprised and disappointed" (standard euphemism) that the
obvious advantages of such an extension of the PDB's mission were met mostly
with reasons to not do it, with the expected arguments about the volume of
data etc ... . The fact that the PDB is giving its assent to the kind of
initiative that Ashley is talking about is mildly encouraging, but I concur
with others in thinking that this is too important to be left to volunteer
initiatives of this kind in the long run. 


 The side issue of verification and of spotting possible falsification
seems (as others have also mentioned) to be part of a bigger picture, which
is the risk of misbehaviour on the part of anybody who is put under
excessive pressure. Whatever the outcome of this particular incident may
eventually turn out to be, recent hiccups with structures published in
high-impact journals are a sign of a sickness in the system by which the
productivity of scientists is evaluated. We need to find ways of backing off
from this Hollywood-like fascination with (even, addiction to) these
journals, and from the pressure to "publish in Nature or Science, or
perish". I can remember Robert Huber telling me 20 years ago that we should
only publish in real journals, not in "magazines" (as he called Nature) -
and clearly, he had a point. A few years ago, Nature even started organising
conferences on the areas of science it considered as the hottest - a blatant
interference of mecantile media in the internal freedom of judgement of the
scientific community. 


 The two issues (a LivePDB, and the dictatorship of the high-impact
media) are clearly related, in the sense that a LivePDB would be a very
strong basis for calling to account the reviewers and editorial mechanisms
of these journals: this would occur "by default", instead of having to be
triggered by creating such traumatic "causes celebres" as that which emerged
last week.


 With best wishes,
 
  Gerard.


-- 

 ===
 * *
 * Gerard Bricogne [EMAIL PROTECTED]  *
 * *
 * Global Phasing Ltd. *
 * Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
 * Cambridge CB3 0AX, UK   Fax: +44-(0)1223-366889 *
 * *
 ===


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Anastassis Perrakis

On Aug 17, 2007, at 8:36, George M. Sheldrick wrote:


Dominika is entirely correct, the F and (especially) sigma(F) values
are clearly inconsistent with my naive suggestion that columns could
have been swapped accidentally in an mtz file.


Since the sigma(f) issue has been raised, let me elaborate on that.

Faking observations is difficult. Faking the experimental  
uncertainties is even more difficult.
If one would fake a dataset, there would almost always be an implicit  
imprint of the procedure.


I am told for example that some journals now use a company that  
claims they can see gels and
pictures that were 'photo-shopped'. That is - i am told by friends -  
the reason that some journals
ask for 400 dpi pictures, while the Nature printers can do about 120  
dpi in real life.


Thus, I analyzed the distribution of the experimental sigmas in three  
structures:

1E3M and two structures of mine at the same resolution (1CTN, 1E3M)

The results are in:

http://xtal.nki.nl/nature-debate/


Thats also a response to Tom Hurley's email ... I think we are  
obliged to look at this case and
show to all crystallographers that read the board what the evidence  
are. This has no lawful consequences.
I think the debate is healthy and I have not seen anyone asking to  
lynch or crucify anybody.
As long as the discussion is about evidence and not passing ethical  
or other judgement, I think its good

to go on. Also its a good lesson for everybody to learn:

 
===
*** "Keep your images, you gels, your logbooks. Its your obligation.  
Make sure all your colleagues do so".
 
===


(especially if you are the PI you carry the primary responsibility  
for all primary data that support your publication to be available on  
request)
If you do not keep to that principle, some mean mob might lynch you,  
even if you are right. So, be correct in your approaches.


I am making the web site public with my analysis for people to see  
one more evidence that there are
doubts and Murthy et al should provide primary data, as many others  
have said. Statements of certain innocence

or certain guilt, should indeed not be public.

So, i will wait now for the data - as simple as that.

Tassos


Re: [ccp4bb] public forums

2007-08-17 Thread Hurley, Thomas D.
Just to be clear on my previous email, I encourage all evidence to be
gathered, posted and accumulated for proper evaluation of this case.  It
will prove incredibly useful for all involved as this case moves
forward. 

 

It is also proving useful for course development - frankly, I am
accumulating files, commentaries, responses and arguments (with
appropriate acknowledgments for their efforts at data analysis in this
situation) for use in our research ethics course - hope no one minds.
If anyone does, please let me know and I'll remove that content, and any
reference to that content in this case study.

 

My email was just to be wary of what one says in a publicly posted
forum.

 

Tom Hurley

 

 



[ccp4bb] Application Specialist - X-Ray Crystallography for Life-Science

2007-08-17 Thread Adam, Martin
Dear group members,
 
I'd like to bring following opening to your attention:
 
'Application Specialist - X-Ray Crystallography for Life-Science 
 
Bruker is seeking for its Singapore Expertise Centre an individual to
support our regional sale force in the sales and marketing of our x-ray
crystallography solutions to the macro-molecular community in Asia. 
 Responsibilities:
*   Extensive travel in the Asia Region 
*   Advise prospective customers on the most appropriate analytical
solutions 
*   Prepare and conduct demonstration at Bruker facilities 
*   Support our existing customers 
*   Prepare and conduct presentations for commercial and scientific
events 
*   Co-ordination of activities with our Application Laboratories in
Europe and US
 
 Qualifications:
*   Advanced Degree in Macro-Molecular Sciences 
*   Strong background in the use of X-ray structural biology
instrumentation 
*   Experience in interdisciplinary environment including scientific
international collaborations 
*   Excellent communication skills 
*   
Affinity for a commercially oriented position.
 
Bruker AXS offers a competitive salary and comprehensive benefits
package.
 
Applicants are invited to submit their application with resume in
confidence to:
 
BRUKER AXS Pte Ltd 77 Science Park Drive #01-01/02 CINTECH III 
Mr. George Tang 
Singapore Science Park 
118256 Singapore, SG 
Phone +65 (6777) 5883 
Fax +65 (6774) 7703 
 
[EMAIL PROTECTED]
'
 
 
Martin 
 
PS: Applications for the position  'Technical Sales Manager Life-Science
Systems' based in Delft are still possible.


**
Der Inhalt dieser E-Mail ist vertraulich und ausschliesslich fuer den 
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat 
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, dass 
jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder 
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich in 
diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.

The information contained in this email is confidential. It is intended solely 
for the addressee. Access to this email by anyone else is unauthorized. If you 
are not the intended recipient, any form of disclosure, reproduction, 
distribution or any action taken or refrained from in reliance on it, is 
prohibited and may be unlawful. Please notify the sender immediately.

**



Re: [ccp4bb] public forums

2007-08-17 Thread Andreas Forster
Before going into an ethics class, I think this material needs to go into a
crystallography class.  Every crystallographer (and maybe even every
structural biologist) should know why the structure is fishy, how fishiness
can be detected, how one can make sure one's own structure is legit, etc.
Analysis of others' structures and validation of your own, basically going
back to how Eleanor started this thread.  These past two days have proven
infinitely insightful and inspiring to me.


Andreas


On 8/17/07, Hurley, Thomas D. <[EMAIL PROTECTED]> wrote:
>
>  Just to be clear on my previous email, I encourage all evidence to be
> gathered, posted and accumulated for proper evaluation of this case.  It
> will prove incredibly useful for all involved as this case moves forward.
>
>
>
> It is also proving useful for course development – frankly, I am
> accumulating files, commentaries, responses and arguments (with appropriate
> acknowledgments for their efforts at data analysis in this situation) for
> use in our research ethics course – hope no one minds.  If anyone does,
> please let me know and I'll remove that content, and any reference to that
> content in this case study.
>
>
>
> My email was just to be wary of what one says in a publicly posted forum.
>
>
>
> Tom Hurley
>
>
>
>
>


[ccp4bb] Richard Reid and the PDB

2007-08-17 Thread Kim Henrick
After Richard Reid more than 100 million people each year
have to have their shoes examined and one effect is that older
buildings like Heathrow Terminal 3 is the most painful place on earth,
the cost of someone trying light their shoelaces has affect us all.


The discussion on archiving image data sets -
 I guess that less than 1% of the image sets for PDB entries
   are useful to software development (and can be got privately)
 I guess that maybe 1 in 10,000 entries have a series problem that
   may require referees to look at the images (and can be
   accessed upon demand)


The cost of disks for your PC - kitchen table disks from a supermarket,
may be $1 per Gbyte on USB i/o but an archive centre required to maintain
the data will probably need RAID 0/1 - RAID 10, this has high performance,
and highest data protection, i.e. can tolerate multiple drive failures,
but has high redundancy cost overhead, if you havent noticed a large
collection of disks has failures. Look up the problems that the series
of Landsat satellites have had from 1980 onwards with the problems arising
out of the volume of data and the short life of computer compatible tapes
and optical discs. Archiving data lacks glamour it’s the boring day to day
rectification and storage of information, very little money gets spent on
this task,for remote sensing the most significant cost is
transmission/correction and archiving the data - Three semi-trailer loads
of Landsat tapes were found (literally) moldering in a damp basement in
Baltimore after people and funding agencies lost interest. Oh yes and
detectors change every 5 years and processing software gets lost.

At the EBI before we even get a single disk we pay £100,000 for a cabinet
- disks cost around £500 for 300gigbytes (and not the best disks these are
around the same cost for 146 Gigbytes). Disk technology changes every 5
years - an archive cost is to recover the data ever 5 years onto the next
generation of hardware. Molecular Biology and structure research is
carried
out by 1000's of groups not centrally by a single international
treaty setup of a telescope that is run centrally and financed to do
the data archiving. Molecular biology uses some in-house data collection,
most is carried at sync - despite the fact that there are many beamlines,
most data again is from less than 10 sites - these major synchrotron sites
are committed to data storage by various methods of Storage Hierarchy, and
a better solution to a central archive
is issuing a doi or set of doi's to the data associated with a PDB entry
and associating the doi with a PDB entry. Many countries have spent over
the last 5-7 years billion dollars on GRID and distributed data
storage - use this technology to leave the data where it is and
pick it up on demand. Googles solution to large datasets such as
single file tomograms - is to ship disks - there is no simple cheap
FTP/WWW solution to large datasets.

The cost of a central archive is several million dollars per year
to setup and run long term and who will pay - 40% of the pdb comes
from the USA (the biggest single contributor) but with the difficulting in
funding from the EU and national funding priorities is the USA to carry
this cost? Is the cost to be shared as in the table below? So far only the
USA, Japan and Europe (through UK, EU and EMBL), pays for the PDB.
The USA also pays for UniProt and other large scale data gathering
areas are carried out by nationally funded centres not by the large
number of individuals and countries that the PDB comes from.

The administration to get all the datasets is far higher than
the $1/gigabyte on a USB disk that is next to useless for an archive.
The costs of storage are rapidly decreasing but there has not been
a great change in Latencies and bandwidth - If everything gets
faster&cheaper at the same rate then nothing really changes i.e.
more structures are done.

Why inspect the shoes of every PDB entry and every structural biologist
when if we can detect the very rare suspect problem and get an agreed
course of action?

kim

PDB Depositions (1 January 1999 to 26 June 2007)
Country1999 2000 2001 2002 2003 2004 2005 2006 2007 Total
ARGENTINA00000   2 16716
AUSTRALIA   52   46   45   59   59   75   94   91   51   572
AUSTRIA 132712   22   26   20598
BELGIUM 29   28   41   24   38   27   36   50   29   302
BRAZIL   72   12   16   34   24   34   78   30   237
CANADA 109  117  131  115  157  185  280  334  183  1611
CHILE010001200 4
CHINA   22   28   32   29   50   66  132  121   61   541
CROATIA  010010050 7
CZECH_REPUBLIC   214654   123441
CUBA 000001000 1
DENMARK 19   34   26   31   44   45   37   589   303
FINLAND 14   1

[ccp4bb] Depositing Raw Data

2007-08-17 Thread Mischa Machius
Since there are several sub-plots in that mammoth thread, I thought  
branching out would be a good idea.


I think working out the technicalities of how to publicly archive raw  
data is fairly simple compared to the bigger picture.


1. Indeed, all the required meta-data will need to be captured just  
like for refined coordinates. This will be an additional burden for  
the depositor, but it's clearly necessary, and I do consider it  
trivial. Trivial, as in the sense of "straightforward", i.e., there  
is no fundamental problem blocking progress. As mentioned, current  
data processing software captures most of the pertinent information  
already, although that could be improved. I am sure that the  
beamlines, diffraction-system manufacturers and authors of data- 
processing software can be convinced to cooperate appropriately, if  
the community needs these features.


2. More tricky is the issue of a unified format for the images, which  
would be very helpful. There have been attempts at creating unified  
image formats, but - to my knowledge - they haven't gotten anywhere.  
However, I am also convinced that such formats can be designed, and  
that detector manufacturers will have no problems implementing them,  
considering that their detectors may not be purchased if they don't  
comply with requirements defined by the community.


3. The hardware required to store all those data, even in a highly  
redundant way, is clearly trivial.


4. The biggest problem I can see in the short run is the burden on  
the databank when thousands of investigators start transferring  
gigabytes of images, all at the same time.


5. I think the NSA might go bonkers over that traffic, although it  
certainly has enough storage space. Imagine, they let their decoders  
go wild on all those images. They might actually find interesting  
things in them...


So, what's the hold-up?

Best - MM



On Aug 17, 2007, at 3:23 AM, Winter, G (Graeme) wrote:


Storing all the images *is* expensive but it can be done - the JCSG do
this and make available a good chunk of their raw diffraction data.  
The

cost is, however, in preparing this to make the data useful for the
person who downloads it.

If we are going to store and publish the raw experimental measurements
(e.g. the images) which I think would be spectacular, we will also  
need

to define a minimum amount of metadata which should be supplied with
this to allow a reasonable chance of reproduction of the results. This
is clearly not trivial, but there is probably enough information in  
the

harvest and log files from e.g. CCP4, HKL2000, Phenix to allow this.

The real problem will be in getting people to dig out that tape / dvd
with the images on, prepare the required metadata and "deposit" this
information somewhere. Actually storing it is a smaller challenge,
though this is a long way from being trivial.

On an aside - firewire disks are indeed a very cheap way of storing  
the

data. There is a good reason why they are much cheaper than the
equivalent RAID array. They fail. Ever lost 500GB of data in one go?
Ouch. ;o)

Just MHO.

Cheers,

Graeme

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Phil Evans
Sent: 16 August 2007 15:13
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools

What do you count as raw data? Rawest are the images - everything  
beyond

that is modellling - but archiving images is _expensive_!
Unmerged intensities are probably more manageable

Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the
important step of initiating this. For now I'd like to respond to one
of them, as it concerns something I and colleagues in Australia are
doing:


The more information that is available, the easier it will be to
detect fabrication (because it is harder to make up more information
convincingly). For instance, if the diffraction data are deposited,
we can check for consistency with the known properties of real
macromolecular crystals, e.g. that they contain disordered solvent
and not vacuum. As Tassos Perrakis has discovered, there are
characteristic ways in which the standard deviations depend on the
intensities and the resolution. If unmerged data are deposited,  
there



will probably be evidence of radiation damage, weak effects from
intrinsic anomalous scatterers, etc. Raw images are probably even
harder to simulate convincingly.


After the recent Science retractions we realised that its about time
raw data was made available. So, we have set about creating the
necessary IT and software to do this for our diffraction data, and  
are



encouraging Australian colleagues to do the same. We are about a week
away from launching a web-accessible repository for our recently
published (eg deposited in PDB) data, and this should coincide  
with an


upcoming publication describing a new structure

[ccp4bb] Michele Cianci is out of the office.

2007-08-17 Thread Michele Cianci
I will be out of the office starting  08/01/2007 and will not return until
08/31/2007.

I will respond to your message when I return.


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread cdekker
I have read the comments in Nature and on the CCP4BB with excitement, 
almost the same level of excitement with which I followed this year's 
Tour de France. Speaking of which, I can see many analogies.


Political correctness aside, what is the meaning of an (internationally 
acclaimed) journal if people cannot point out flaws in publicly 
available scientific models, whether published in Nature or any other 
journal? We can go to great length in making all crystallographic data 
freely and publicly available but that doesn’t change the problem with 
commercial journals as mentioned by Rupp. Mind you, I have never been 
in the luxury position to contemplate submission to Nature, so maybe I 
should stick with Robert Huber’s credo. It is not the credibility of 
crystallography that is at stake, but the credibility of the big 
journals, as phrased by Jones and Kleywegt.


Also, I am not interested in 'public excoriation' at all. I am fussed 
about a clean database (PDB) and letters in journals that truly reflect 
the opinion of scientists all over the world. It is embarassing to see 
that accepting the comment by Jansen et al took 6 months, while 
accepting the original papers containing a lot more experimental data 
to judge took just over a month.


There was a recent debate about anonymous refereeing. Here is something 
else to consider: whenever a paper contains a crystal structure it 
should be refereed by at least one crystallographer (Bernard Rupp’s 
pairing scheme). If this were to be a crystallographer working on 
proteins/subjects unrelated to those in the paper, then there is less 
need for the crystallographer to remain anonymous and this might help a 
great deal in persuading authors to send off their raw data in 
confidence.


Carien


Dr. Carien Dekker
Dept. Cell & Molecular Biology
Institute of Cancer Research
Chester Beatty Laboratories
237 Fulham Road
London, SW3 6JB

Email: [EMAIL PROTECTED]
Tel. +44 ((0) 207 153 5185
Fax. +44 (0) 207 351 3325

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company 
Limited by Guarantee, Registered in England under Company No. 534147 with its 
Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the 
message is received by anyone other than the addressee, please return the 
message to the sender by replying to it and then delete the message from your 
computer and network.


Re: [ccp4bb] Richard Reid and the PDB

2007-08-17 Thread Clemens Vonrhein
Hi Kim,

On Fri, Aug 17, 2007 at 03:04:27PM +0100, Kim Henrick wrote:
> The discussion on archiving image data sets -
>  I guess that less than 1% of the image sets for PDB entries
>are useful to software development (and can be got privately)

I would think much much more would be useful for software development:
the 1% will maybe be the real difficult ones or the extremely
high-resolution ones - which is interesting for development as well.

But a large extent of software development goes into methods for the
'everyday' kind of structure ... the 99%. It is fairly easy for us
getting hold of very difficult datasets and extremely good datasets:
but try and find a boring, standard dataset. However, these are the
ones that new methods need to improve upon as well. Otherwise we end
up with methods and software that work for the 5 special cases a year
and not all the others.

We routinely use ALL the PDB entries (where structure factors are
available) for running various tests and analysis (and I know other
software development groups do this on a regular basis as well).

If there was a mechanism for depositing raw images it might start in
the same way as the deposition of structure factors: fairly small
scale and only done by a few. It might not even have to start as a
requirement (structure factors weren't a requirement either at the
beginning afaik). But just having the possibility might open up new
insight: not only into the particular structures or software
development, but also into how to handel this data.

And yes, I agree: a distributed system (a la doi) would be much, much
better than a central system. At least the existing synchrotron or
SG-center infrastructure could be re-used.

Cheers

Clemens

-- 

***
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--
* BUSTER Development Group  (http://www.globalphasing.com)
***


Re: [ccp4bb] Depositing Raw Data

2007-08-17 Thread Winter, G (Graeme)
Hi,

On the question of a "uniform format" for this data, I believe that
imgCIF has been working towards this end for a number of years. As a
very vocal supporter of this I would like to say that this is an ideal
archival format for the following reasons:

 - the terms are clearly defined (or are currently in the process of
such)
 - the images are compressed, typically by a factor of 2-3
 - some data reduction packages (Mosflm, XDS) can read them in this
compressed form 

Now, I would be telling lies if I said that this was all finished but I
think it is fair to say that this is already a long way down the path.
As soon as I am convinced that you can go losslessly to and from imgCIF,
and the data reduction programs will give precisely the same results, I
will convert thus freeing up 5 firewire disks.

For more information on this take a look at medsbio.org.

Cheers,

Graeme

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Mischa Machius
Sent: 17 August 2007 15:07
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Depositing Raw Data

Since there are several sub-plots in that mammoth thread, I thought
branching out would be a good idea.

I think working out the technicalities of how to publicly archive raw
data is fairly simple compared to the bigger picture.

1. Indeed, all the required meta-data will need to be captured just like
for refined coordinates. This will be an additional burden for the
depositor, but it's clearly necessary, and I do consider it trivial.
Trivial, as in the sense of "straightforward", i.e., there is no
fundamental problem blocking progress. As mentioned, current data
processing software captures most of the pertinent information already,
although that could be improved. I am sure that the beamlines,
diffraction-system manufacturers and authors of data- processing
software can be convinced to cooperate appropriately, if the community
needs these features.

2. More tricky is the issue of a unified format for the images, which
would be very helpful. There have been attempts at creating unified
image formats, but - to my knowledge - they haven't gotten anywhere.  
However, I am also convinced that such formats can be designed, and that
detector manufacturers will have no problems implementing them,
considering that their detectors may not be purchased if they don't
comply with requirements defined by the community.

3. The hardware required to store all those data, even in a highly
redundant way, is clearly trivial.

4. The biggest problem I can see in the short run is the burden on the
databank when thousands of investigators start transferring gigabytes of
images, all at the same time.

5. I think the NSA might go bonkers over that traffic, although it
certainly has enough storage space. Imagine, they let their decoders go
wild on all those images. They might actually find interesting things in
them...

So, what's the hold-up?

Best - MM



On Aug 17, 2007, at 3:23 AM, Winter, G (Graeme) wrote:

> Storing all the images *is* expensive but it can be done - the JCSG do

> this and make available a good chunk of their raw diffraction data.
> The
> cost is, however, in preparing this to make the data useful for the 
> person who downloads it.
>
> If we are going to store and publish the raw experimental measurements

> (e.g. the images) which I think would be spectacular, we will also 
> need to define a minimum amount of metadata which should be supplied 
> with this to allow a reasonable chance of reproduction of the results.

> This is clearly not trivial, but there is probably enough information 
> in the harvest and log files from e.g. CCP4, HKL2000, Phenix to allow 
> this.
>
> The real problem will be in getting people to dig out that tape / dvd 
> with the images on, prepare the required metadata and "deposit" this 
> information somewhere. Actually storing it is a smaller challenge, 
> though this is a long way from being trivial.
>
> On an aside - firewire disks are indeed a very cheap way of storing 
> the data. There is a good reason why they are much cheaper than the 
> equivalent RAID array. They fail. Ever lost 500GB of data in one go?
> Ouch. ;o)
>
> Just MHO.
>
> Cheers,
>
> Graeme
>
> -Original Message-
> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of 
> Phil Evans
> Sent: 16 August 2007 15:13
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] The importance of USING our validation tools
>
> What do you count as raw data? Rawest are the images - everything 
> beyond that is modellling - but archiving images is _expensive_!
> Unmerged intensities are probably more manageable
>
> Phil
>
>
> On  16 Aug 2007, at 15:05, Ashley Buckle wrote:
>
>> Dear Randy
>>
>> These are very valid points, and I'm so glad you've taken the 
>> important step of initiating this. For now I'd like to respond to one

>> of them, as it concerns something I and colleagues in Australia are
>> doing:
>>>
>>> The more information that i

Re: [ccp4bb] nature cb3 response - USING validation criteria

2007-08-17 Thread Eleanor Dodson
I am trying to keep track of, and notes on these Emails, many of which 
raise very important Qs for our community, much more so than any 
problems with a particulat structure..


But it would be a lot easier if you would all stick to the same Subject 
header!!!


Eleanor


Re: [ccp4bb] nature cb3 response

2007-08-17 Thread Oganesyan, Vaheh
Phoebe,

Any and every reviewer has right to request the coordinate file as well
as sf file.
The other question is: Why most of them are not exercising their rights?



Vaheh Oganesyan

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, August 16, 2007 8:10 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] nature cb3 response

A comment from my collaborator's student suggests a partial
answer.  This afternoon he happened to say "but of course the
reviewers will look at the model, I just deposited it!".  He was
shocked to find that "hold for pub" means that even reviewers can't
access the data.  Can that be changed?  It would take a bit of
coordination between journals and the PDB, but I think the student is
right - it is rather shocking that the data is sitting there nicely
deposited but the reviewers can't review it.
 Phoebe Rice

At 05:33 PM 8/16/2007, Bernhard Rupp wrote:
>Ok, enough political (in)correctness. Irrespective of fabricated or
not,
>I think this points to a general problem of commercial journals and
>their review process, as it seems that selling (.com) hot stuff
>induces an extraordinary capability of denial.
>
>The comment, as someone noted, does not address the allegations
>at all. This is reminiscent of my dealings with Nature in two
>related cases: They ignore or stonewall until the dispute is ended
>with an irrelevant comment. In one case, Axel B later proved
>with the correct structure that what we had commented on earlier
>was entirely correct.
>In the second case, the comment (by some of the leading experts,
>not just by me nobody) was rejected with no recourse based on another
>non-fact-addressing author comment and not published at all.
>
>Compare this to a similar case, when the Jacs editor (.org <--)
contacted me
>
>on its own accord to check for a related problem, leading to retraction
>of the paper after the editor (a scientist himself) evaluated
>facts and response.
>
>It also seems to depend on the handling Nature editor. I have made maps
of
>several structures from data unhesitantly provided by the editor when I
>had reason to ask for them during review. Those were also responsive to
>a mini-table-1-comment I sent on cb3, but I did not hear from the
editor
>assigned to cb3.
>
>This time again, the review completely failed (table 1 and comment
issues),
>and
>the editorial process failed as well, because the response is not
adequate.
>If someone - as tentatively and tactfully it may have been phrased -
accused
>
>me of faking data they'd eat shit until hell freezes over
>
>It is as simple as that: Extraordinary claim (super structure, bizarre
stats
>and properties) requires extraordinary proof. This rule has not been
>followed, which reflects poorly on the scientific process in this case.
>
>I also note that in no case known to me, persons involved in
irregularities
>have ever appeared as frequent (or at all) communicators on the ccp4bb.
>
>As long as grant review and tenure committees rely on automated
>bibliometrics
>and impact factors (and who knows who) to decide academic careers and
>funding,
>the big journals will remain the winners. The system has become
>self-perpetuating.
>
>Back to grant writing now.
>Need to get that paper out to nature...
>
>Cheers, br
>
>PS: it is pointless flaming me. I am the messenger only.
>
>-Original Message-
>From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
>Bernhard Rupp
>Sent: Thursday, August 16, 2007 2:03 PM
>To: CCP4BB@JISCMAIL.AC.UK
>Subject: Re: [ccp4bb] nature cb3 comment pdf
>
>thxthxthx to all the day and night owls for the many copies
>The winners have been selected, no more entries needed.
>thx again br
>
>-Original Message-
>From: Miriam Hirshberg [mailto:[EMAIL PROTECTED] On Behalf Of
Miriam
>Hirshberg
>Sent: Thursday, August 16, 2007 1:58 PM
>To: Bernhard Rupp
>Subject: Re: [ccp4bb] nature cb3 comment pdf
>
>
>attached, Miri
>
>
>On Thu, 16 Aug 2007, Bernhard Rupp wrote:
>
> > my nature web connection just died for good (probably a preventive
> > measure..)
> > Could someone kindly email me the pdfs of the comment and response?
> > Thx br
> > -
> > Bernhard Rupp
> > 001 (925) 209-7429
> > +43 (676) 571-0536
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > http://www.ruppweb.org/
> > -
> > People can be divided in three classes:
> > The few who make things happen
> > The many who watch things happen
> > And the overwhelming majority
> > who have no idea what is happening.
> > -
> >


---
Phoebe A. Rice
Assoc. Prof., Dept. of Biochemistry & Molecular Biology
The University of Chicago
phone 773 834 

Re: [ccp4bb] nature cb3 response

2007-08-17 Thread price
I don't think all journals have that policy, and even so, making the 
reviewers specifically request the data implies to the reviewees that 
somebody out there doesn't trust them.


You shouldn't have to insult the authors in order to do a proper 
reviewing job - you should just be able to download the data and 
coordinates right along with the pdf.


Phoebe

At 09:29 AM 8/17/2007, you wrote:

Phoebe,

Any and every reviewer has right to request the coordinate file as well
as sf file.
The other question is: Why most of them are not exercising their rights?



Vaheh Oganesyan

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, August 16, 2007 8:10 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] nature cb3 response

A comment from my collaborator's student suggests a partial
answer.  This afternoon he happened to say "but of course the
reviewers will look at the model, I just deposited it!".  He was
shocked to find that "hold for pub" means that even reviewers can't
access the data.  Can that be changed?  It would take a bit of
coordination between journals and the PDB, but I think the student is
right - it is rather shocking that the data is sitting there nicely
deposited but the reviewers can't review it.
 Phoebe Rice

At 05:33 PM 8/16/2007, Bernhard Rupp wrote:
>Ok, enough political (in)correctness. Irrespective of fabricated or
not,
>I think this points to a general problem of commercial journals and
>their review process, as it seems that selling (.com) hot stuff
>induces an extraordinary capability of denial.
>
>The comment, as someone noted, does not address the allegations
>at all. This is reminiscent of my dealings with Nature in two
>related cases: They ignore or stonewall until the dispute is ended
>with an irrelevant comment. In one case, Axel B later proved
>with the correct structure that what we had commented on earlier
>was entirely correct.
>In the second case, the comment (by some of the leading experts,
>not just by me nobody) was rejected with no recourse based on another
>non-fact-addressing author comment and not published at all.
>
>Compare this to a similar case, when the Jacs editor (.org <--)
contacted me
>
>on its own accord to check for a related problem, leading to retraction
>of the paper after the editor (a scientist himself) evaluated
>facts and response.
>
>It also seems to depend on the handling Nature editor. I have made maps
of
>several structures from data unhesitantly provided by the editor when I
>had reason to ask for them during review. Those were also responsive to
>a mini-table-1-comment I sent on cb3, but I did not hear from the
editor
>assigned to cb3.
>
>This time again, the review completely failed (table 1 and comment
issues),
>and
>the editorial process failed as well, because the response is not
adequate.
>If someone - as tentatively and tactfully it may have been phrased -
accused
>
>me of faking data they'd eat shit until hell freezes over
>
>It is as simple as that: Extraordinary claim (super structure, bizarre
stats
>and properties) requires extraordinary proof. This rule has not been
>followed, which reflects poorly on the scientific process in this case.
>
>I also note that in no case known to me, persons involved in
irregularities
>have ever appeared as frequent (or at all) communicators on the ccp4bb.
>
>As long as grant review and tenure committees rely on automated
>bibliometrics
>and impact factors (and who knows who) to decide academic careers and
>funding,
>the big journals will remain the winners. The system has become
>self-perpetuating.
>
>Back to grant writing now.
>Need to get that paper out to nature...
>
>Cheers, br
>
>PS: it is pointless flaming me. I am the messenger only.
>
>-Original Message-
>From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
>Bernhard Rupp
>Sent: Thursday, August 16, 2007 2:03 PM
>To: CCP4BB@JISCMAIL.AC.UK
>Subject: Re: [ccp4bb] nature cb3 comment pdf
>
>thxthxthx to all the day and night owls for the many copies
>The winners have been selected, no more entries needed.
>thx again br
>
>-Original Message-
>From: Miriam Hirshberg [mailto:[EMAIL PROTECTED] On Behalf Of
Miriam
>Hirshberg
>Sent: Thursday, August 16, 2007 1:58 PM
>To: Bernhard Rupp
>Subject: Re: [ccp4bb] nature cb3 comment pdf
>
>
>attached, Miri
>
>
>On Thu, 16 Aug 2007, Bernhard Rupp wrote:
>
> > my nature web connection just died for good (probably a preventive
> > measure..)
> > Could someone kindly email me the pdfs of the comment and response?
> > Thx br
> > -
> > Bernhard Rupp
> > 001 (925) 209-7429
> > +43 (676) 571-0536
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > http://www.ruppweb.org/
> > -
> > People can be divided in three classes:
> > The few who make things happen
> > The many who watch th

Re: [ccp4bb] High Rfac/Rfree for a 1.6A reso structure

2007-08-17 Thread Peter Zwart
Hi Elisabetta

Your unit cell is pseudo F222, with twin law (-h,-k,h+k+l):

iotbx.explore_metric_symmetry --unit_cell="47.7533.6191.04
90.000 103.635  90.000" --space_group=C2

(output below)

Did you rule out twinning?
Try runing phenix.xtriage for some intensity stats and/or
phenix.refine for lsq twin refinement  and see where the twin fraction
goes.

Contact me personally if you need help or if you are not sure how to run things.

HTH

Peter


A summary of the constructed point group graph object is given below


--
Input crystal symmetry
--
Unit cell:  (47.75, 33.609, 91.046, 90.0,
103.635001, 90.0)
Unit cell volume:  141990.310662
Space group:  C 1 2 1


--
Lattice symmetry deduction
--
Niggli cell:  (29.196295141678505, 29.196295141678505,
90.088423129339745, 94.902521424695664, 97.427868096943868,
109.71853700070298)
Niggli cell volume:  70995.1553311
Niggli transformed input symmetry:  C 1 2 1 (x+y,-x+y+z,z)
Symmetry of Niggli cell:  F 2 2 2 (x-y+z,x+y+z,2*z)


All pointgroups that are both a subgroup of the lattice symmetry and
a supergroup of the Niggli transformed input symmetry wil now be listed,
as well as their minimal supergroups/maximal subgroups and symmetry
operators that generate them.
For each pointgroup, a list of compatible spacegroups will be listed.
Care is taken that there are no sysmetatic absence violation with the
provided input spacegroup.


Vertices and their edges


Point group   F 2 2 2 (x-y+z,x+y+z,2*z)   is a maximal subgroup of :
  * None

Point group   C 1 2 1 (x+y,-x+y+z,z)   is a maximal subgroup of :
  * F 2 2 2 (x-y+z,x+y+z,2*z)



-
Transforming point groups
-

>From C 1 2 1 (x+y,-x+y+z,z)   to  F 2 2 2 (x-y+z,x+y+z,2*z)  using :
  *  -h,-k,h+k+l



--
Compatible spacegroups
--

Spacegroups compatible with a specified point group
**and** with the systematic absenses specified by the
input space group, are listed below.

Spacegroup candidates in point group F 2 2 2 (x-y+z,x+y+z,2*z):
  * F 2 2 2  33.61 47.75 177.01 90.00 90.00 90.00

Spacegroup candidates in point group C 1 2 1 (x+y,-x+y+z,z):
  * C 1 2 1  47.75 33.61 92.30 90.00 106.55 90.00


2007/8/17, Sabini, Elisabetta <[EMAIL PROTECTED]>:
> Dear all,
>
> I have a structure at 1.6A. At the end of refinement the Rfactor and Rfree
> are quite high (23/29.6%). Here are some info:
>
> Morphology of crystals: large but thin plates (150 x 400 x 20 microns)
> Number of residues: 160 (2 x 80); 103 water molecules
> Space group: C2
> Unit Cell: 47.7533.6191.04  90.000 103.635  90.000
>
> Data is a merge of a low resolution and high resolution sweep. The
> correlation between the two data sets is:
>
>  DATA SETS  NUMBER OF COMMON  CORRELATION   RATIO OF COMMON   B-FACTOR
>   #i   #j REFLECTIONS BETWEEN i,j  INTENSITIES (i/j)  BETWEEN i,j
>
> 124049   0.9731.0002 0.0001
>
> I include the data collection and refinement statistics in the attachment.
>
> Can you please suggest to me possible reasons for the high Rfact and Rfree?
>
> Thanks,
>
> Elisabetta.
>
> --
> Elisabetta Sabini, Ph.D.
> Research Assistant Professor
> University of Illinois at Chicago
> Department of Biochemistry and Molecular Genetics
> Molecular Biology Research Building, Rm. 1108
> 900 South Ashland Avenue
> Chicago, IL 60607
> U.S.A.
>
> Tel: (312) 996-6299
> Fax: (312) 355-4535
> E-mail: [EMAIL PROTECTED]
>
>


Re: [ccp4bb] High Rfac/Rfree for a 1.6A reso structure

2007-08-17 Thread Eleanor Dodson
There are obvious ones like - incomplete structure etc, but have you 
tried TLS? Sometimes this can  dramaticaly improve the R factors.


You seem to have lost of lot of the low resolution data - could this 
mean you had overloads which naybe could be rescued..  That can down 
grade the maps a good deal

Eleanor


Sabini, Elisabetta wrote:

Dear all,

I have a structure at 1.6A. At the end of refinement the Rfactor and Rfree
are quite high (23/29.6%). Here are some info:

Morphology of crystals: large but thin plates (150 x 400 x 20 microns)
Number of residues: 160 (2 x 80); 103 water molecules
Space group: C2
Unit Cell: 47.7533.6191.04  90.000 103.635  90.000

Data is a merge of a low resolution and high resolution sweep. The
correlation between the two data sets is:

 DATA SETS  NUMBER OF COMMON  CORRELATION   RATIO OF COMMON   B-FACTOR
  #i   #j REFLECTIONS BETWEEN i,j  INTENSITIES (i/j)  BETWEEN i,j

124049   0.9731.0002 0.0001

I include the data collection and refinement statistics in the attachment.

Can you please suggest to me possible reasons for the high Rfact and Rfree?

Thanks,

Elisabetta.

  


Re: [ccp4bb] Depositing Raw Data

2007-08-17 Thread Winter, G (Graeme)
On the question of what is "trivial" I would argue that deposition of
the raw diffraction images is not - for a few simple reasons:

 - if I deposit structure factors with my model it is very
straightforward to run an analysis tool which will confirm that model
and I/F belong together. Without a pretty respectable amount of data
reduction it will be impossible to demonstrate that the images belong to
the structure being deposited.

 - I am fortunate in that the only data I need to curate is that kindly
provided to me by friendly crystallographers. The amount is such that it
resides comfortably on 8 firewire disks as the primary resource. I still
however need to keep a spreadsheet as to *where* to find a certain data
set. This is not insurmountable but is a measurable amount of work.

"The hardware required to store all those data, even in a highly
redundant way, is clearly trivial."

No I think I have to agree with Kim on this one - it is not trivial.
Setting up even a modest RAID array costs real money and takes real
time. Setting one up with a guaranteed quality of service (uptime,
bandwidth, disaster recovery) capable of storing the images which
directly contributed to every deposition would be very expensive.

Now, if the people who collected that data could find a "place on the
web" to store the compressed images, and deposit a link to where they
can be found, that would be ace. People who are interested in the
results can go fetch the images - since probably only ~ 5 people would
actually download them this would not be too bandwidth intensive. If
that place on the web dies - well hopefully they still have them on
firewire or on DVD ...

Now the main difference with this would be to move the images from being
something inconveniently large which usually we don't share to something
inconveniently large we usually make available to those who are
interested. From the replies to the list in this discussion you could
probably figure out who *is* interested, and it is not world+dog. This
shift of burden would turn it from something which would require a huge
grant proposal which would almost certainly not get funded to a 1%
increase in the cost of the structure solution for the lab in question
and peace of mind for the community at large.

I for one would be happy to write a few scripts which will compress
batches of images, write the index pages, compute the md5sums so we know
that the data are ok and generally put together a toolbox for curating
images on the web.

So we end up with...

You want to argue with my structure - well, here are the frames, *you*
solve it.

Can't really argue with that.

Again, just MHO.

Cheers,

Graeme





-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Mischa Machius
Sent: 17 August 2007 15:07
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Depositing Raw Data

Since there are several sub-plots in that mammoth thread, I thought
branching out would be a good idea.

I think working out the technicalities of how to publicly archive raw
data is fairly simple compared to the bigger picture.

1. Indeed, all the required meta-data will need to be captured just like
for refined coordinates. This will be an additional burden for the
depositor, but it's clearly necessary, and I do consider it trivial.
Trivial, as in the sense of "straightforward", i.e., there is no
fundamental problem blocking progress. As mentioned, current data
processing software captures most of the pertinent information already,
although that could be improved. I am sure that the beamlines,
diffraction-system manufacturers and authors of data- processing
software can be convinced to cooperate appropriately, if the community
needs these features.

2. More tricky is the issue of a unified format for the images, which
would be very helpful. There have been attempts at creating unified
image formats, but - to my knowledge - they haven't gotten anywhere.  
However, I am also convinced that such formats can be designed, and that
detector manufacturers will have no problems implementing them,
considering that their detectors may not be purchased if they don't
comply with requirements defined by the community.

3. The hardware required to store all those data, even in a highly
redundant way, is clearly trivial.

4. The biggest problem I can see in the short run is the burden on the
databank when thousands of investigators start transferring gigabytes of
images, all at the same time.

5. I think the NSA might go bonkers over that traffic, although it
certainly has enough storage space. Imagine, they let their decoders go
wild on all those images. They might actually find interesting things in
them...

So, what's the hold-up?

Best - MM



On Aug 17, 2007, at 3:23 AM, Winter, G (Graeme) wrote:

> Storing all the images *is* expensive but it can be done - the JCSG do

> this and make available a good chunk of their raw diffraction data.
> The
> cost is, however, in preparing this to 

[ccp4bb] High Rfac/Rfree for a 1.6A reso structure

2007-08-17 Thread Sabini, Elisabetta
Dear all,

I have a structure at 1.6A. At the end of refinement the Rfactor and Rfree
are quite high (23/29.6%). Here are some info:

Morphology of crystals: large but thin plates (150 x 400 x 20 microns)
Number of residues: 160 (2 x 80); 103 water molecules
Space group: C2
Unit Cell: 47.7533.6191.04  90.000 103.635  90.000

Data is a merge of a low resolution and high resolution sweep. The
correlation between the two data sets is:

 DATA SETS  NUMBER OF COMMON  CORRELATION   RATIO OF COMMON   B-FACTOR
  #i   #j REFLECTIONS BETWEEN i,j  INTENSITIES (i/j)  BETWEEN i,j

124049   0.9731.0002 0.0001

I include the data collection and refinement statistics in the attachment.

Can you please suggest to me possible reasons for the high Rfact and Rfree?

Thanks,

Elisabetta.

-- 
Elisabetta Sabini, Ph.D.
Research Assistant Professor
University of Illinois at Chicago
Department of Biochemistry and Molecular Genetics
Molecular Biology Research Building, Rm. 1108
900 South Ashland Avenue
Chicago, IL 60607
U.S.A.

Tel: (312) 996-6299
Fax: (312) 355-4535
E-mail: [EMAIL PROTECTED]


Arnonccp4.doc
Description: MS-Word document


[ccp4bb] Diffraction data - a modest proposal

2007-08-17 Thread james . phillips
I suggest that it is not neccessary to submit all images for a reasonable review
of a crystallography paper.

Submission and storage of a "few" images at the start of a data collection and a
"few" from the end, along with (1) unmerged F or I, including all reflections
that are systematic absences for the claimed Space Group (2)data collection
parameters (3) data integration and scaling parameters.

This should allow for visual inspection of the data frames, which can tell a
lot! Also integration and scaling of enough reflections to compare with the
submitted total list and a good check of the Space Group and a check on
radiation damage/ crystal slippage and cell parameter changes during data
collection.

I suggest this compromise between no or all images as good for the review
process without the need to have everything online.

James Phillips
Duke University Medical Center


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Manfred S. Weiss
> how about this scenario:
> the structure really is as published, or very like it, but upon
> refinement, R-factors were high, other indicators were dodgy etc. so
> the authors were afraid to publish as is and made up a dataset to
> support their structure - this would be a bit less bad.
>

pardon me. you can't seriously mean what you just wrote. maybe
somebody ought to be checking your structures ...

;-)  Manfred.




*  *
*Dr. Manfred S. Weiss  *
*  *
* Team Leader  *
*  *
* EMBL Hamburg OutstationFon: +49-40-89902-170 *
* c/o DESY, Notkestr. 85 Fax: +49-40-89902-149 *
* D-22603 Hamburg   Email: [EMAIL PROTECTED] *
* GERMANY   Web: www.embl-hamburg.de/~msweiss/ *
*  *



[ccp4bb] Depositing raw data

2007-08-17 Thread Diana Tomchick
Providing a centralized archive of raw data for crystallographic  
images would be a great asset to everyone, but most especially to the  
original investigator.


I'm reminded of a conversation held with one of my postdoctoral  
mentors back in the mid 1990's. She was gazing at the rows of reel-to- 
reel tapes on the top shelf in our office, and lamenting that the  
data on most of these tapes was now unrecoverable due to the limited  
life span of magnetic tapes. This prompted her to encourage her  
students to deposit structure factors along with coordinates to the  
PDB (at a time when this was not yet required by any journal or  
granting agency), "so that our group will always have access to them,  
in case our backup schemes fail." I think she would have asked us to  
deposit our original images for the same reason, if it had been  
possible.


Most archivists will tell you that the best way to back up data is to  
make several copies and to store those copies in several different  
locations. This holds true for scientific data, printed material,  
photos, bank records, etc.


Diana

* * * * * * * * * * * * * * * * * * * * * * * * * * * *
Diana R. Tomchick
Associate Professor
University of Texas Southwestern Medical Center
Department of Biochemistry
5323 Harry Hines Blvd.
Rm. ND10.214B   
Dallas, TX 75390-8816, U.S.A.   
Email: [EMAIL PROTECTED]
214-645-6383 (phone)
214-645-6353 (fax)


Re: [ccp4bb] Depositing Raw Data

2007-08-17 Thread Mischa Machius
I think having each lab deal with archiving their own data and making  
them available to the public is much less practical than having a  
centralized repository for the following reasons:


1. The overhead would be many times that of a centralized repository,  
because of multiplication of efforts.


2. Every lab would need a dedicated and trained person to maintain  
the archive. It is very likely that this person will be some graduate  
student or post-doc. When these people leave, another person needs to  
be identified. Looking at how software, chemical inventories, etc.  
are maintained in places that operate like this, I have little  
confidence that a reliable repository would ever be established and  
properly maintained.


3. Cost. Due to economy of scale, it would be much more expensive to  
distribute a repository of this size over hundreds of labs, each one  
of them needing to provide the hardware for its portion.


With respect to funding, if the community identifies a central  
repository as an indispensable, must-have item, dedicated to  
maintaining the highest standards in a scientific field, and aimed at  
avoiding false interpretation of data as well as at reducing the  
occurrence of fabricated data, I can't imagine funding agencies would  
object too much.


Best - MM



On Aug 17, 2007, at 10:45 AM, Winter, G ((Graeme)) wrote:


On the question of what is "trivial" I would argue that deposition of
the raw diffraction images is not - for a few simple reasons:

No I think I have to agree with Kim on this one - it is not trivial.
Setting up even a modest RAID array costs real money and takes real
time. Setting one up with a guaranteed quality of service (uptime,
bandwidth, disaster recovery) capable of storing the images which
directly contributed to every deposition would be very expensive.

Now, if the people who collected that data could find a "place on the
web" to store the compressed images, and deposit a link to where they
can be found, that would be ace. People who are interested in the
results can go fetch the images - since probably only ~ 5 people would
actually download them this would not be too bandwidth intensive. If
that place on the web dies - well hopefully they still have them on
firewire or on DVD ...

Now the main difference with this would be to move the images from  
being
something inconveniently large which usually we don't share to  
something

inconveniently large we usually make available to those who are
interested. From the replies to the list in this discussion you could
probably figure out who *is* interested, and it is not world+dog. This
shift of burden would turn it from something which would require a  
huge

grant proposal which would almost certainly not get funded to a 1%
increase in the cost of the structure solution for the lab in question
and peace of mind for the community at large.

I for one would be happy to write a few scripts which will compress
batches of images, write the index pages, compute the md5sums so we  
know

that the data are ok and generally put together a toolbox for curating
images on the web.

So we end up with...

You want to argue with my structure - well, here are the frames, *you*
solve it.

Can't really argue with that.

Again, just MHO.

Cheers,

Graeme





-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Mischa Machius
Sent: 17 August 2007 15:07
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Depositing Raw Data

Since there are several sub-plots in that mammoth thread, I thought
branching out would be a good idea.

I think working out the technicalities of how to publicly archive raw
data is fairly simple compared to the bigger picture.

1. Indeed, all the required meta-data will need to be captured just  
like

for refined coordinates. This will be an additional burden for the
depositor, but it's clearly necessary, and I do consider it trivial.
Trivial, as in the sense of "straightforward", i.e., there is no
fundamental problem blocking progress. As mentioned, current data
processing software captures most of the pertinent information  
already,

although that could be improved. I am sure that the beamlines,
diffraction-system manufacturers and authors of data- processing
software can be convinced to cooperate appropriately, if the community
needs these features.

2. More tricky is the issue of a unified format for the images, which
would be very helpful. There have been attempts at creating unified
image formats, but - to my knowledge - they haven't gotten anywhere.
However, I am also convinced that such formats can be designed, and  
that

detector manufacturers will have no problems implementing them,
considering that their detectors may not be purchased if they don't
comply with requirements defined by the community.

3. The hardware required to store all those data, even in a highly
redundant way, is clearly trivial.

4. The biggest problem I can see in the short run i

[ccp4bb] Data mining using consensus sequences

2007-08-17 Thread P Hubbard

Hi all,

I'm certain this can be done, but I figure it would be much easier asking 
the news group than spending hours trawling though random web pages:


Does anyone know how to search protein sequence databases using a consensus 
sequence which allows varying gap sizes, and homologous amino acids. For 
example, say I have a consensus sequence DXXDN - can I scan databases which 
will pull out sequences containing, for example, EDQ?


A quick comment on the structure validation thread... Are bioscientists as 
critical of mistakes in protein and genome databases as the crystallography 
community seems to be of the PDB? Are mistakes in these databases more or 
less common that the PDB? Also, do they require you to submit your raw data 
(i.e., sequencing chromatograms, etc)? I feel that a simple improvement in 
the current validation tools which would flag unusual data submissions based 
on various parameters, and require another expert to go over just these 
structures to confirm them, would be sufficient.


Thanks,

AGS

_
Find a local pizza place, movie theater, and more….then map the best route! 
http://maps.live.com/default.aspx?v=2&ss=yp.bars~yp.pizza~yp.movie%20theater&cp=42.358996~-71.056691&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=950607&encType=1&FORM=MGAC01


Re: [ccp4bb] nature cb3 response

2007-08-17 Thread Arun Malhotra
The proper choice of reviewers is important, but perhaps some of the 
burden for fact checking should be shifted to the journal.  Some 
journals are already doing image analysis to check gels/microscopy 
images, and there is no reason why this cannot be extended for structures.


In practical terms, when you submit a paper, apart from uploading the 
text and image files, coordinate file(s) and structure factors will also 
have to be submitted.  The journal would then run some scripts 
(developed by CCP4?) on the coordinate/SF data and make a basic analysis 
file available to the reviewers.  This could be an extended version of 
the table seen in crystallography papers, but with outlying values 
highlighted, some fact-checking, and perhaps a summary for 
non-crystallographer reviewers.
The journal could even make a more sophisticated "EDS"-type server 
(perhaps contracted out to EDS?), where the electron density for any 
region could be checked easily online by the reviewers, without having 
to reveal the full structure factors and coordinates.  This would keep 
the burden for keeping the coordinates/structure factors confidential on 
the journal rather than an anonymous reviewer.


The archiving/submission of raw data are important, but it is difficult 
to see how even competent reviewers can be convinced to do detailed 
analysis - even for something as easy to check as gels, I have never 
gone beyond just zooming/squinting when reviewing papers.


Arun Malhotra




Bernhard Rupp wrote:

Nature DOES require availability of structure factors and coordinates as
a matter of policy, and also to make them available for review on demand.
If the reviewer does not want them, the editor can't do anything about.

One also cannot demand of a biologist reviewer to reconstruct
maps, but others long ago and I recently have suggested in nature to make 
at least the RSCC mandatory reading for to reviewers - a picture
says more than words... 

One way would be to carefully pair reviewers for crystallographic papers - 
a competent biologist and a competent crystallographer. 
Being not a famous biologist I am generally unimpressed by the 
story, and unemotional about the crystallography. The biology reviewer 
on the other hand could make the point how relevant and exciting 
the structure and its biological implications are. The 
proper pairing is something where I would lay the responsibility 
heavy on the journal editors. That is just a matter of due diligence. 
  
br


-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, August 16, 2007 5:10 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] nature cb3 response

A comment from my collaborator's student suggests a partial answer.  This
afternoon he happened to say "but of course the reviewers will look at the
model, I just deposited it!".  He was shocked to find that "hold for pub"
means that even reviewers can't access the data.  Can that be changed?  It
would take a bit of coordination between journals and the PDB, but I think
the student is right - it is rather shocking that the data is sitting there
nicely deposited but the reviewers can't review it.
 Phoebe Rice

  



--
Arun Malhotra  Phone: (305) 243-2826
Associate ProfessorLab:   (305) 243-2890
Dept. of Biochemistry & Molecular Biology  Fax:   (305) 243-3955
University of Miami School of Medicine
PO Box 016129 E-Mail: [EMAIL PROTECTED]
Miami, FL 33101  Web: http://structure.med.miami.edu


[ccp4bb] High Rfac/Rfree for a 1.6A reso structure

2007-08-17 Thread Santarsiero, Bernard D.
On Fri, August 17, 2007 10:56 am, Eleanor Dodson wrote:
> There are obvious ones like - incomplete structure etc, but have you
> tried TLS? Sometimes this can  dramaticaly improve the R factors.
>



When people run TLS, are they trying to improve the R-factors, or actually
generate a reasonable model for the structure? I really think people
should be extremely cautious in just turning on the TLS option with dozens
of arbitrary groups during refinement without thinking about the resultant
model.

Bernie Santarsiero


Re: [ccp4bb] nature cb3 response

2007-08-17 Thread Arun Malhotra
The proper choice of reviewers is important, but perhaps some of the 
burden for fact checking should be shifted to the journal.  Some 
journals are already doing image analysis to check gels/microscopy 
images, and there is no reason why this cannot be extended for structures.


In practical terms, when you submit a paper, apart from uploading the 
text and image files, coordinate file(s) and structure factors will also 
have to be submitted.  The journal would then run some scripts 
(developed by CCP4?) on the coordinate/SF data and make a basic analysis 
file available to the reviewers.  This could be an extended version of 
the table seen in crystallography papers, but with outlying values 
highlighted, some fact-checking, and perhaps a summary for 
non-crystallographer reviewers. 

The journal could even make a more sophisticated "EDS"-type server 
(perhaps contracted out to EDS?), where the electron density for any 
region could be checked easily online by the reviewers, without having 
to reveal the full structure factors and coordinates.  This would keep 
the burden for keeping the coordinates/structure factors confidential on 
the journal rather than an anonymous reviewer.


The archiving/submission of raw data are important, but it is difficult 
to see how even competent reviewers can be convinced to do detailed 
analysis - even for something as easy to check as gels, I have never 
gone beyond just zooming/squinting when reviewing papers.


Arun Malhotra


Bernhard Rupp wrote:

Nature DOES require availability of structure factors and coordinates as
a matter of policy, and also to make them available for review on demand.
If the reviewer does not want them, the editor can't do anything about.

One also cannot demand of a biologist reviewer to reconstruct
maps, but others long ago and I recently have suggested in nature to make 
at least the RSCC mandatory reading for to reviewers - a picture
says more than words... 

One way would be to carefully pair reviewers for crystallographic papers - 
a competent biologist and a competent crystallographer. 
Being not a famous biologist I am generally unimpressed by the 
story, and unemotional about the crystallography. The biology reviewer 
on the other hand could make the point how relevant and exciting 
the structure and its biological implications are. The 
proper pairing is something where I would lay the responsibility 
heavy on the journal editors. That is just a matter of due diligence. 
  
br


-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, August 16, 2007 5:10 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] nature cb3 response

A comment from my collaborator's student suggests a partial answer.  This
afternoon he happened to say "but of course the reviewers will look at the
model, I just deposited it!".  He was shocked to find that "hold for pub"
means that even reviewers can't access the data.  Can that be changed?  It
would take a bit of coordination between journals and the PDB, but I think
the student is right - it is rather shocking that the data is sitting there
nicely deposited but the reviewers can't review it.
 Phoebe Rice

  



--
Arun Malhotra  Phone: (305) 243-2826
Associate ProfessorLab:   (305) 243-2890
Dept. of Biochemistry & Molecular Biology  Fax:   (305) 243-3955
University of Miami School of Medicine
PO Box 016129 E-Mail: [EMAIL PROTECTED]
Miami, FL 33101  Web: http://structure.med.miami.edu


Re: [ccp4bb] Data mining using consensus sequences

2007-08-17 Thread Miller, Mitchell D.
Hi Paul,

Look at the syntax for prosite patterns. It will
do what you want.  There are probably
better syntax rule descriptions / tutorial, but 
here is a quick link...
http://ca.expasy.org/tools/scnpsit3.html#patsyntax 

E.g. you can search with a pattern like
'D-x(2,4)-D-[N,Q]'

Regards,
Mitch

 

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of P Hubbard
Sent: Friday, August 17, 2007 9:14 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Data mining using consensus sequences

Hi all,

I'm certain this can be done, but I figure it would be much easier asking 
the news group than spending hours trawling though random web pages:

Does anyone know how to search protein sequence databases using a consensus 
sequence which allows varying gap sizes, and homologous amino acids. For 
example, say I have a consensus sequence DXXDN - can I scan databases which 
will pull out sequences containing, for example, EDQ?

A quick comment on the structure validation thread... Are bioscientists as 
critical of mistakes in protein and genome databases as the crystallography 
community seems to be of the PDB? Are mistakes in these databases more or 
less common that the PDB? Also, do they require you to submit your raw data 
(i.e., sequencing chromatograms, etc)? I feel that a simple improvement in 
the current validation tools which would flag unusual data submissions based 
on various parameters, and require another expert to go over just these 
structures to confirm them, would be sufficient.

Thanks,

AGS

_
Find a local pizza place, movie theater, and morethen map the best route! 
http://maps.live.com/default.aspx?v=2&ss=yp.bars~yp.pizza~yp.movie%20theater&cp=42.358996~-71.056691&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=950607&encType=1&FORM=MGAC01


Re: [ccp4bb] High Rfac/Rfree for a 1.6A reso structure

2007-08-17 Thread Juergen Bosch

Santarsiero, Bernard D. wrote:


On Fri, August 17, 2007 10:56 am, Eleanor Dodson wrote:
 


There are obvious ones like - incomplete structure etc, but have you
tried TLS? Sometimes this can  dramaticaly improve the R factors.

   





When people run TLS, are they trying to improve the R-factors, or actually
generate a reasonable model for the structure? I really think people
should be extremely cautious in just turning on the TLS option with dozens
of arbitrary groups during refinement without thinking about the resultant
model.

Bernie Santarsiero

 

Yes, that is the reason why you should submit your files e.g. to 
http://skuld.bmsc.washington.edu/~tlsmd/


Juergen

--
Jürgen Bosch
University of Washington
Dept. of Biochemistry, K-426
1705 NE Pacific Street
Seattle, WA 98195
Box 357742
Phone:   +1-206-616-4510
FAX: +1-206-685-7002


Re: [ccp4bb] Depositing raw data

2007-08-17 Thread Das, Debanu
How about the Petabox:
 http://www.capricorn-tech.com/ 
"Capricorn Technologies was founded in 2004 and provides petabyte-class storage 
solutions for organizations worldwide. Capricorn's PetaBox technology grew out 
of a search for high density, low cost, low power storage systems for the 
world's largest data collections"

http://www.archive.org/web/petabox.php 
"Internet Archives spins off PetaBox production to newly-formed Capricorn 
Technologies."
"Capricorn replicates the Internet Archive's successful deployment of the 
PetaBox for major academic institutions, digital preservationists, government 
agencies, HPC and major research sites, medical imaging providers, digital 
image repositories, storage outsourcing sites, and other enterprises around the 
globe."

Most importantly, in terms of funding for it: 
http://www.archive.org/iathreads/post-view.php?id=121377
"Well, the Internet Archive is now officially a library according to the State 
of California! It turns out that to receive a particular kind of federal 
funding, you have to have your state sign off that you are a library. With a 
minimum amount of back and forth (including their saying "we have not evaluated 
something like the Internet Archive before") we were given the approval" 


My two cents addition to the suggestions made by others include something like 
a fee-based service for structural biology labs who solve protein structures to 
send their data/information to a company for a confidential review (of course 
such a company would need to be certified, like the ISO? and accepted by the 
user community). This would be similar to stories that I have heard of labs in 
some countries contracting out their paper writing (don't know how true that 
is!). A fee-based service would eliminate the problem of reviewers having to 
review all the experimental information presented in the structure and will 
also relieve the journals from additional pressure to check each and every 
structure before deposition. Also, with the numerous journals now publishing 
protein crystal structures, it may not be practical for all the journals to 
implement standardised policies for validation by the journal or the reviewer). 
Maybe this could be an arm of the PDB. As long as it is a fee based
verification service maybe it can be implemented without placing additional 
burden on existing staff (including graduate students/postdocs in individual 
labs having to supervise the maintenance of data archives). Considering the 
amount of money labs are spending on paying journals for publishing papers, 
colorful artwork, etc., it might be well worth spending say 
$1000-$2000/structure to get it validated instead of just spending that on 
printing fancy pictures. With the potential cost savings that labs are getting 
with the advent on high-throughput methods for expression, purification, 
crystallization, data collection, and structure determination, it shouldn't be 
too much to ask.

Regards,
Debanu.
--
Debanu Das,
Scientist, JCSG.
SSRL, Menlo Park, CA.

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Diana Tomchick
Sent: Friday, August 17, 2007 9:15 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Depositing raw data

Providing a centralized archive of raw data for crystallographic images would 
be a great asset to everyone, but most especially to the original investigator.

I'm reminded of a conversation held with one of my postdoctoral mentors back in 
the mid 1990's. She was gazing at the rows of reel-to- reel tapes on the top 
shelf in our office, and lamenting that the data on most of these tapes was now 
unrecoverable due to the limited life span of magnetic tapes. This prompted her 
to encourage her students to deposit structure factors along with coordinates 
to the PDB (at a time when this was not yet required by any journal or granting 
agency), "so that our group will always have access to them, in case our backup 
schemes fail." I think she would have asked us to deposit our original images 
for the same reason, if it had been possible.

Most archivists will tell you that the best way to back up data is to make 
several copies and to store those copies in several different locations. This 
holds true for scientific data, printed material, photos, bank records, etc.

Diana

* * * * * * * * * * * * * * * * * * * * * * * * * * * *
Diana R. Tomchick
Associate Professor
University of Texas Southwestern Medical Center
Department of Biochemistry
5323 Harry Hines Blvd.
Rm. ND10.214B   
Dallas, TX 75390-8816, U.S.A.   
Email: [EMAIL PROTECTED]
214-645-6383 (phone)
214-645-6353 (fax)


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Ronald E Stenkamp
While all of the comments on this situation have been entertaining, I've been 
most impressed by comments from Bill Scott, Gerard Bricogne and Kim Hendricks.

I think due process is called for in considering problem structures that may or
may not be fabricated.  Public discussion of technical or craftsmanship issues
is fine, but questions of intent, etc are best discussed in private or in more 
formal settings.  We owe that to all involved.

Gerard's comments concerning publishing in journals/magazines like Nature and
Science are correct.  The pressure to publish there is not consistent with
careful, well-documented science.  For many years, we've been teaching our 
graduate students about some of the problems with short papers in those types 
of journals.  The space limitations and the need for "relevance" force omission 
of important details, so it's very hard to judge the merit of those papers. 
But, don't assume that other "real" journals do much better with this.  There's 
a lot of non-reproducible science in the journals.  Much of it comes from not 
recognizing or reporting important experimental or computational details, but 
some of it is probably simply false.

Kim's comments about the technical aspects of archiving data make a lot of
sense to me.  The costs of making safe and secure archives are not
insignificant.  And we need to ask if the added value of such archives is worth
the added costs.  I'm not yet convinced of this.

The comments about Richard Reid, shoes, and air-travel are absolutely true.  We
should be very careful about requiring yet more information for submitted
manuscripts.  Publishing a paper is becoming more and more like trying to get
through a crowded air-terminal.  Every time you turn around, there's another
requirement for some additional detail about your work.  In the vast majority
of cases, those details won't matter at all.  In a few cases, a very careful
and conscious referee might figure out something significant based on that
little detail.  But is the inconvenience for most us worth that little benefit?

Clearly, enough information was available to Read, et al. for making the case 
that the original structure has problems.  What evidence is there that 
additional data, like raw data images, would have made any difference to the 
original referees and reviewers?  Refereeing is a human endeavor of great 
importance, but it is not going to be error-free.  And nothing can make it 
error-free.  You simply need to trust that people will be honest and do the 
best job possible in reviewing things.  And that errors that make it through 
the process and are deemed important enough will be corrected by the next layer 
of reviewers.

I believe this current episode, just like those in the past, are terrific 
indicators that our science is strong and functioning well.  If other fields 
aren't reporting and correcting problems like these, maybe it's because they 
simply haven't found them yet.  That statement might be a sign of my 
crystallographic arrogance, but it might also be true.

Ron Stenkamp


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread William Scott

Hi Ed:


On Aug 17, 2007, at 10:43 AM, Edwin Pozharski wrote:


Bill,

presumption of innocence applies when person's life or freedom is  
at stake, other words - when you are accused of a crime.  It does  
not apply in many other circumstances (for instance, when applying  
for short-term visa to enter the US, it is an applicant's  
responsibility to prove to the consular officer that he is not  
planning to stay in the US illegally - so one has to prove the  
absence of intent to commit a crime in the future).


If someone is accused of having fabricated data, which is to have  
committed fraud, quite a bit is in fact at stake.  Once can at least  
expect to lose one's job, and if convicted of criminal fraud, this  
could involve a fine or even imprisonment (defrauding a government  
funding agency I would expect is a pretty major felony in both the US  
and in Europe).


To follow the legal analogy, does the letter of Gros et al. proves  
beyond reasonable doubt that the structure in question is indeed a  
fabrication? As you said, it is a compelling case.


No, I said it "appears" (to me) to be "a compelling case", but I have  
only heard one side of it. Have you heard the other?


Now, if these defendants are indeed innocent, they should have  
exculpatory evidence in their possession - the diffraction images.   
If they produce such evidence - the case is dismissed.  If not,  
ladies and gentlemen of the jury may and will find them guilty.


What jury?  Us?  Who is the judge?  Do you not see why this is  
problematic?




I agree with you 100% that anyone is innocent until proven guilty.   
In this case, however, the trial is over and jury is deliberating.   
What's the verdict?


What trial?  All I see is the functional equivalent of an  
indictment.  Actually, less than that, because in the US at least, an  
indictment is handed down by a Grand Jury after examining evidence  
and subpoenaing witnesses.  We (if you are to suggest the analogy  
that we are the Grand Jury) have done nothing of this sort.


Sorry, the analogy is flawed, and dangerously so.

We are qualified to judge science, but that does not endow us with  
the ability to establish criminal intent, which is paramount in a  
fraud case.


Bill




Regards,

Ed.

William Scott wrote:

 No one knows definitively if this was fabricated.

Well, at least one person does.
But I agree, it is important to keep in mind that the proper venue  
for determining guilt or innocence in the case of fraud is the  
court system.


Until fairly recently, the idea of presumed innocence and the  
right to cross-examine accusers and witnesses has been considered  
fundamental to civil society.


The case certainly sounds compelling, but this is all the more  
reason to adhere to these ideals.


Bill Scott



--
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion  
arise;

When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /



Re: [ccp4bb] nature cb3 response

2007-08-17 Thread Edward Snell
I'm trying to keep up with all the email on this discussion so someone
may have posted this already. If so I apologise (or apologize as
appropriate).

With quite appropriate timing Brown and Ramaswamy have just published a
very nice paper in Acta D. on the quality of Protein Structures -
http://journals.iucr.org/d/issues/2007/09/00/dz5114/dz5114.pdf. In the
paper table 5 shows an interesting relationship between the quality of
the structure and the journal of primary citation. A good deal of the
authors discussion is reflected in some of the recent postings on this
subject

Cheers,

Eddie.

Edward Snell Ph.D.
Assistant Prof. Department of Structural Biology, SUNY Buffalo,
Hauptman-Woodward Medical Research Institute
700 Ellicott Street, Buffalo, NY 14203-1102
Phone: (716) 898 8631 Fax: (716) 898 8660
Email: [EMAIL PROTECTED]  Telepathy: 42.2 GHz
 
Heisenberg was probably here!
 


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Matthew . Franklin
CCP4 bulletin board  wrote on 08/17/2007 07:38:00
AM:

>
> Thus, I analyzed the distribution of the experimental sigmas in three
> structures:
> 1E3M and two structures of mine at the same resolution (1CTN, 1E3M)
>
> The results are in:
>
> http://xtal.nki.nl/nature-debate/
>
>

To add my two cents to Tassos' very enlightening graphs, I'll just say that
the reflections with enormous F/sigF are probably generated by lines like
this in the cif file:

1 1 1 -57   31   21 o  63.4 **

I believe the asterisks are supposed to represent a sigma(F) > 10,000,
since there are sigmas all the way up to that limit in the file.  Granted,
this is supposed to be a high-resolution reflection, but I'm not aware of
any data processing program which will output reflections with F/sigF less
than 0.006!  Indeed, I just looked through one of my old datasets processed
with HKL2000, then Truncate, and confirmed that my F/sigF drops smoothly
and asymptotically to 1, and not below.  The paper in question says they
used HKL2000 followed by CCP4 programs for MR and refinement, so you'd
expect something similar, no?

There are lots of odd F/sigF in the 2hr0 cif file - even excluding the
reflections with 0.0 or *** for a sigF, there are still 14 reflections with
an F/sigF less than 0.01, 1028 reflections with F/sigF < 0.1, and 12626
reflections (6.5% of the total) with F/sigF < 1.

I think this is even more unusual than the sharp upper limit of the F/sigF
distribution.

- Matt

PS.  The above are my views, not those of ImClone Systems, CCP4, the
companies which send this email, the Boy Scouts, or any other organization
which I may be associated with now, in the past, or in the future :P

--
Matthew Franklin , Ph.D.
Senior Scientist, ImClone Systems
180 Varick Street, 6th floor
New York, NY 10014
phone:(917)606-4116   fax:(212)645-2054



Confidentiality Note:  This e-mail, and any attachment to it, contains
privileged and confidential information intended only for the use of the
individual(s) or entity named on the e-mail.  If the reader of this e-mail
is not the intended recipient, or the employee or agent responsible for
delivering it to the intended recipient, you are hereby notified that
reading it is strictly prohibited.  If you have received this e-mail in
error, please immediately return it to the sender and delete it from your
system.  Thank you.


Re: [ccp4bb] nature cb3 response

2007-08-17 Thread Arun Malhotra
This will have to be automated, and not subjectively flagged by the 
editors - something just like EDS ("review-EDS"?), where the data 
submitted gets sent automatically by the journal to an EDS type server, 
and the results are available only to the reviewers. 

The reviewers should be able to log in as needed and look at the EDS 
summary as well as the maps/structure interactively (in the java Astex 
viewer), but not download the coordinates/SFs.  Once the review is done, 
the data gets deleted (or better still, archived).


EDS, PDBsum, MSD and other similar sites are invaluable, but this is all 
retrospective - the time that we really need such analysis is during the 
review phase of a paper/structure.


Arun Malhotra


[EMAIL PROTECTED] wrote:

Hi Arun,
I think you have a higher opinion of journal editors than many of us?
Unless you run the same refinement software exactly the same way (i.e. 
same bulk solvent & anisotropy corrections, TLS, etc) you won't get 
exactly the same R factors.  A crystallographer would know what is and 
isn't a significant difference, hopefully, but an editor would be 
likely to flag all the wrong things.

Phoebe

At 10:52 AM 8/17/2007, you wrote:
The proper choice of reviewers is important, but perhaps some of the 
burden for fact checking should be shifted to the journal.  Some 
journals are already doing image analysis to check gels/microscopy 
images, and there is no reason why this cannot be extended for 
structures.


In practical terms, when you submit a paper, apart from uploading the 
text and image files, coordinate file(s) and structure factors will 
also have to be submitted.  The journal would then run some scripts 
(developed by CCP4?) on the coordinate/SF data and make a basic 
analysis file available to the reviewers.  This could be an extended 
version of the table seen in crystallography papers, but with 
outlying values highlighted, some fact-checking, and perhaps a 
summary for non-crystallographer reviewers.
The journal could even make a more sophisticated "EDS"-type server 
(perhaps contracted out to EDS?), where the electron density for any 
region could be checked easily online by the reviewers, without 
having to reveal the full structure factors and coordinates.  This 
would keep the burden for keeping the coordinates/structure factors 
confidential on the journal rather than an anonymous reviewer.


The archiving/submission of raw data are important, but it is 
difficult to see how even competent reviewers can be convinced to do 
detailed analysis - even for something as easy to check as gels, I 
have never gone beyond just zooming/squinting when reviewing papers.


Arun Malhotra




Bernhard Rupp wrote:
Nature DOES require availability of structure factors and 
coordinates as
a matter of policy, and also to make them available for review on 
demand.

If the reviewer does not want them, the editor can't do anything about.

One also cannot demand of a biologist reviewer to reconstruct
maps, but others long ago and I recently have suggested in nature to 
make at least the RSCC mandatory reading for to reviewers - a picture

says more than words...
One way would be to carefully pair reviewers for crystallographic 
papers - a competent biologist and a competent crystallographer. 
Being not a famous biologist I am generally unimpressed by the 
story, and unemotional about the crystallography. The biology 
reviewer on the other hand could make the point how relevant and 
exciting the structure and its biological implications are. The 
proper pairing is something where I would lay the responsibility 
heavy on the journal editors. That is just a matter of due diligence.

br

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, August 16, 2007 5:10 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] nature cb3 response

A comment from my collaborator's student suggests a partial answer.  
This
afternoon he happened to say "but of course the reviewers will look 
at the
model, I just deposited it!".  He was shocked to find that "hold for 
pub"
means that even reviewers can't access the data.  Can that be 
changed?  It
would take a bit of coordination between journals and the PDB, but I 
think
the student is right - it is rather shocking that the data is 
sitting there

nicely deposited but the reviewers can't review it.
 Phoebe Rice





--
Arun Malhotra  Phone: (305) 243-2826
Associate ProfessorLab:   (305) 243-2890
Dept. of Biochemistry & Molecular Biology  Fax:   (305) 243-3955
University of Miami School of Medicine
PO Box 016129 E-Mail: [EMAIL PROTECTED]
Miami, FL 33101  Web: http://structure.med.miami.edu


--- 


Phoebe A. Rice
Assoc. Prof., Dept. of Biochemistr

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread James Stroud
It seems that a public discussion with points and counterpoints 
presented openly and fairly is in complete adherence to the ideals of 
due process. Since this discussion is not deciding the criminal fate of 
any individual, it does not seem necessary to defer it to any political 
government. Also, were any criminal charges ever brought forth, one 
might think an innocent defendent would appreciate the benefit of the 
world's experts pondering the facts in an open forum.


James


William Scott wrote:
But I agree, it is important to keep in mind that the proper venue for 
determining guilt or innocence in the case of fraud is the court system.


Until fairly recently, the idea of presumed innocence and the right to 
cross-examine accusers and witnesses has been considered fundamental to 
civil society.


The case certainly sounds compelling, but this is all the more reason to 
adhere to these ideals.




--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Anthony Addlagatta
Before we all come up with a solution of archiving all the images to  
bury them in a safe place, we should find an immediate cure.


Attention to Nature and Science:

I have a solution for the continued publication of wrong structures  
in big journals. Every structural manuscript submitted to a high- 
impact journal (or non-crystallographic journal)  should accompany  
with a CD/DVD with all images, clear protocols that are reproducible,  
from image to the maps shown in the journals. It does not matter who  
the referee is. An independent crystallographer will review the  
structure and give his/her recommendations independent of the actual  
contents of the manuscript. it could be another independent  referee  
or a hired crystallographer (one more job!!).


Anthony


Anthony Addlagatta, PhD
Institute of Molecular Biology
University of Oregon
Eugene, OR-97403
Phone: (541) 346-5867
Fax: (541)346-5870
Web: http://uoregon.edu/~anthony


Re: [ccp4bb] Richard Reid and the PDB

2007-08-17 Thread Bernhard Rupp
The PDB is missing a business opportunity. If authors pay
1000s of dollars for publication in high impact journals,
they might as well pay a few bucks for image deposition.
If I could get my images stored reliably and perpetually 
for something like $20-50 a pop, I'd do it. Do you know
where your favourite frames from 1998 are? 

Image storage is a good idea *in itself*, but as an enforcement tool
it only will make the *exceedingly few* Reids more inventive.

PS: Frames for sale. 
http://www.ruppweb.org/new_comp/frame_maker.html

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Kim
Henrick
Sent: Friday, August 17, 2007 7:04 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Richard Reid and the PDB

After Richard Reid more than 100 million people each year have to have their
shoes examined and one effect is that older buildings like Heathrow Terminal
3 is the most painful place on earth, the cost of someone trying light their
shoelaces has affect us all.


The discussion on archiving image data sets -  I guess that less than 1% of
the image sets for PDB entries
   are useful to software development (and can be got privately)  I guess
that maybe 1 in 10,000 entries have a series problem that
   may require referees to look at the images (and can be
   accessed upon demand)


The cost of disks for your PC - kitchen table disks from a supermarket, may
be $1 per Gbyte on USB i/o but an archive centre required to maintain the
data will probably need RAID 0/1 - RAID 10, this has high performance, and
highest data protection, i.e. can tolerate multiple drive failures, but has
high redundancy cost overhead, if you havent noticed a large collection of
disks has failures. Look up the problems that the series of Landsat
satellites have had from 1980 onwards with the problems arising out of the
volume of data and the short life of computer compatible tapes and optical
discs. Archiving data lacks glamour it’s the boring day to day rectification
and storage of information, very little money gets spent on this task,for
remote sensing the most significant cost is transmission/correction and
archiving the data - Three semi-trailer loads of Landsat tapes were found
(literally) moldering in a damp basement in Baltimore after people and
funding agencies lost interest. Oh yes and detectors change every 5 years
and processing software gets lost.

At the EBI before we even get a single disk we pay £100,000 for a cabinet
- disks cost around £500 for 300gigbytes (and not the best disks these are
around the same cost for 146 Gigbytes). Disk technology changes every 5
years - an archive cost is to recover the data ever 5 years onto the next
generation of hardware. Molecular Biology and structure research is carried
out by 1000's of groups not centrally by a single international treaty setup
of a telescope that is run centrally and financed to do the data archiving.
Molecular biology uses some in-house data collection, most is carried at
sync - despite the fact that there are many beamlines, most data again is
from less than 10 sites - these major synchrotron sites are committed to
data storage by various methods of Storage Hierarchy, and a better solution
to a central archive is issuing a doi or set of doi's to the data associated
with a PDB entry and associating the doi with a PDB entry. Many countries
have spent over the last 5-7 years billion dollars on GRID and distributed
data storage - use this technology to leave the data where it is and pick it
up on demand. Googles solution to large datasets such as single file
tomograms - is to ship disks - there is no simple cheap FTP/WWW solution to
large datasets.

The cost of a central archive is several million dollars per year to setup
and run long term and who will pay - 40% of the pdb comes from the USA (the
biggest single contributor) but with the difficulting in funding from the EU
and national funding priorities is the USA to carry this cost? Is the cost
to be shared as in the table below? So far only the USA, Japan and Europe
(through UK, EU and EMBL), pays for the PDB.
The USA also pays for UniProt and other large scale data gathering areas are
carried out by nationally funded centres not by the large number of
individuals and countries that the PDB comes from.

The administration to get all the datasets is far higher than the
$1/gigabyte on a USB disk that is next to useless for an archive.
The costs of storage are rapidly decreasing but there has not been a great
change in Latencies and bandwidth - If everything gets faster&cheaper at the
same rate then nothing really changes i.e.
more structures are done.

Why inspect the shoes of every PDB entry and every structural biologist when
if we can detect the very rare suspect problem and get an agreed course of
action?

kim

PDB Depositions (1 January 1999 to 26 June 2007)
Country1999 2000 2001 2002 2003 2004 2005 2006 2007 Total
ARGENTINA00000   2 16 

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Douglas Theobald

IANAL, but I have been advised by lawyers in highly similar situations.

Publicly accusing someone of either criminal fraud and/or academic  
fraud is serious business, and it is certainly something that could  
get you prosecuted for criminal libel, as the accusation will likely  
have the effect of seriously damaging the accused's reputation.  You  
would basically have to prove in court that your accusations were in  
fact true.


On Aug 17, 2007, at 4:08 PM, James Stroud wrote:

It seems that a public discussion with points and counterpoints  
presented openly and fairly is in complete adherence to the ideals  
of due process. Since this discussion is not deciding the criminal  
fate of any individual, it does not seem necessary to defer it to  
any political government. Also, were any criminal charges ever  
brought forth, one might think an innocent defendent would  
appreciate the benefit of the world's experts pondering the facts  
in an open forum.


James


William Scott wrote:
But I agree, it is important to keep in mind that the proper venue  
for determining guilt or innocence in the case of fraud is the  
court system.
Until fairly recently, the idea of presumed innocence and the  
right to cross-examine accusers and witnesses has been considered  
fundamental to civil society.
The case certainly sounds compelling, but this is all the more  
reason to adhere to these ideals.




--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/


Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Anastassis Perrakis


On 17 Aug 2007, at 20:51, William Scott wrote:

To follow the legal analogy, does the letter of Gros et al. proves  
beyond reasonable doubt that the structure in question is indeed a  
fabrication? As you said, it is a compelling case.


No, I said it "appears" (to me) to be "a compelling case", but I  
have only heard one side of it. Have you heard the other?


Thats the problem Bill. I have not heard the other side. I am waiting  
to hear the other side.
So far this side has not answered the first and main question: 'can  
we have the images please'.


Tassos

PS I dont like single metrics of quality as in the . I did not like  
the table with 'quality' versus journal,
I liked even less the Structural Genomics table. As for the  
statistics I am not an expert.


From what I read in the ActaD paper from Iowa, is that this is  
rather a 'quality metric' of the crystal !
Not of the structure ! If B is 143, and solvent content 78%, yes the  
quality metric will be bad,
but at the same time you did your best, the structure is the best  
that you could get.
So, the message is that Nature, Cell, Science, SPINE and BCSG deal  
with difficult structures,

and not that they produce 'bad' ones.




Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Dale Tronrud

   All theories or models are wrong until proven otherwise.  We all need
to stand up to support the ideas we try to put into press or admit that
we cannot satisfy their critics.  (Not "our critics", this process should
not be personal).  I agree with Nature's editors that questions of
"fabrication" are not matters to be decided within its pages.  All that
is necessary is for critics to propose that the conclusions of a paper
are inconsistent with previous results and/or its authors' own data
to cast doubt and suggest a retraction.  The mechanism by which the
paper was constructed is irrelevant to that topic.

   There are procedures in place for handling charges of misconduct in
most places in the world.  I know that, here in the US, the NIH has
a process for investigating such charges where they funded the research,
and I expect that every university and other host organization has a
local procedure.  These procedures were constructed with the advise of
expensive lawyers and are the only groups that can compel the investigator
to allow the access to the raw data, notebooks, and material that is
required to answer the questions we all have about "how could this have
occurred".

   I hope that the person's who believe that misconduct has occurred in
this case have already sent their letters to those with the power to
investigate this matter.

   Outright fraud is very rare in our field because it is very difficult
to do well enough for the criminal to expect to get away with it.  I
agree with Ron and others that establishing elaborate layers of security
around the publication process to guard against such a low probability
occurrence just makes life more annoying without really gaining much.

   Our field does have a much bigger problems with misuse of techniques
and a "will to believe".  With its explosive growth, training has fallen
behind and many people are solving structures that have never had formal
or even informal instruction.

   Much of the discussion in this thread has been about ideas to harden
the review process, but I think that attention is misplaced.  I can't
imagine that we can convince reviewers to download and reintegrate raw
images when judging the merits of a paper.  We can't even get reviewers
to look at "Table 1"!

   Ideally problems with the model should be identified before the paper
is submitted, and before the paper is written.  I have pulled a number
of structure from the PDB and often find small, but obvious, problems.
It is fairly clear to me that no one other than its builder has ever
looked at these models in a serious fashion.  If the P.I. was familiar
with crystallography and looked at these models the student would have
been sent back to do some cleanup.

   How the community could do it, I don't know, but I think we should
encourage an "internal review" of each model as lab policy in every
crystallography lab.  No paper should be submitted unless someone
other than the model builder has looked over the model, including the
Fo-Fc map, and has reviewed with the model builder the process of
structure solution.

   A new lab may not have the expertise in-house for such a review,
so they would have to make arrangements with some near-by friendly
crystallographer for assistance.  This would allow for information
transfer into the new lab so, in the future, they could stand on
their own later on.

   I don't know the mechanism that could be used to encourage this
practice.  At the very least a committee could write up a "best
practices" document that would emphasize the management needs that
arise in a technique where a person is staring at a graphics screen
hoping beyond hope to "see" that feature that will make the cover
of Nature.  The "will to believe" is so strong that we really need
a second pair of eyes (at least) early in the game.

Dale Tronrud

Ronald E Stenkamp wrote:

While all of the comments on this situation have been entertaining, I've been 
most impressed by comments from Bill Scott, Gerard Bricogne and Kim Hendricks.

I think due process is called for in considering problem structures that may or
may not be fabricated.  Public discussion of technical or craftsmanship issues
is fine, but questions of intent, etc are best discussed in private or in more 
formal settings.  We owe that to all involved.

Gerard's comments concerning publishing in journals/magazines like Nature and
Science are correct.  The pressure to publish there is not consistent with
careful, well-documented science.  For many years, we've been teaching our graduate students about 
some of the problems with short papers in those types of journals.  The space limitations and the 
need for "relevance" force omission of important details, so it's very hard to judge the 
merit of those papers. But, don't assume that other "real" journals do much better with 
this.  There's a lot of non-reproducible science in the journals.  Much of it comes from not 
recognizing or reporting important experimenta

[ccp4bb] diffraction images images/jpeg2000

2007-08-17 Thread Maneesh Yadav
FWIW, I don't agree with storing image data, I don't think they justify the 
cost of storage even remotely (some people debate the value of the structures 
themselves)...but if you want to do it anyway, maybe we should use a format 
like jpeg2000.

Last time I checked, none of the major image processing suites used it, but it 
is a very impressive and mature format that (I think) would be suitable for 
diffraction images.  If anyone is up for experimenting, you can get a nice 
suite of tools from kakadu (just google kakdu + jpeg2000).


Re: [ccp4bb] Depositing Raw Data

2007-08-17 Thread James Holton


I think I can speak from experience on the topic of archiving image 
data.  I have ~40,000 DVD-R disks in my office.  This represents:

6 years of data collection at ALS 8.3.1
60 TB of data (>99% of everything collected, 2 copies)
10,000 data sets
318 PDB entries
$4000 of media

The purposes of this archive are:
1) an "eternal" off-site backup for users of ALS 8.3.1
2) a potential source of "interesting" data sets for methods developers

I define "interesting" as: a data set from a known structure that cannot 
be solved by conventional methods.  After all, if you are developing new 
methods then data that can be solved by existing methods is of limited 
utility.  However, it is also difficult to develop and test a new 
algorithm if you don't know what the "right answer" is.  For this 
reason, I think the most useful data sets to make available are the 
"early" data sets from a structure project (the ones you couldn't 
solve).  Almost by definition, these are the most relevant data sets for 
developing new methods.  We all would like to get a structure solved 
sooner than later.  The problem is getting permission.  Yes, it is 
perhaps "legal" for me to give away people's data if they were collected 
at a DOE facility (without a proprietary research agreement in place).  
However, I would like to keep the few friends I have.  It is important 
to clear such transactions with the scientist who collected the data. 
The difficulty is in connecting 10,000 image data sets to one of 40,000 
PDB entries (which are generally deposited 1-2 years after data 
collection) and then connecting interested parties to one of 500 users 
who collected the data.  This is not an insurmountable logistics 
problem, but I'm afraid it is going to take me a while to do it all in 
my "spare time".


IMHO we do not need a universal image format, we need WELL DOCUMENTED 
image formats!  Once that is done, then writing a converter to-and-from 
imgCIF and any other format will be possible, and then imgCIF can start 
to take hold.  I believe a pivotal step toward a universal image format 
is to have every generator of images (beamlines, diffractometer 
manufacturers, etc.) make a lysozyme data set available in a very public 
place.  Preferably with instructions on how to process it properly.  I 
will now volunteer the following web page as the place to put all these 
example data sets (for now).  To put my money where my mouth is: A 
lysozyme data set from ALS 8.3.1 is available here:


http://bl831.als.lbl.gov/example_data_sets/

WRT archiving in general, I think it also important to point out that 
RAID only protects you against the total failure of one (or maybe two) 
hard disk drives.  RAID does NOT generally protect against anything 
else, such as failures of RAID controller cards, flaky drive cables, bad 
sectors or other "subtle" drive failures, filesystem corruption, power 
surges, disgruntled sysadmins, etc.  I have experienced all of these.  
So, even if your stuff is on a RAID, always backup often.


I chose DVD-R because it is the cheapest media with a long rated 
shelf-life (~100 years if you don't leave them in the sun).  That, and 
our astronomer colleagues (who also have the problem of storing large 
amounts of digital image data) chose DVD-R as the storage medium for Sky 
Survey 2 a few years back.


I recently checked on the prices of alternative media.  Here is what I 
came up with:

media   price  source   lifetime
DVD-R   $0.056/GB ( $0.25 / 4.5 GB )100 year?
LTO-3   $0.063/GB ( $50   / 800 GB )3-5 year
LTO-2   $0.063/GB ( $25   / 400 GB )3-5 year
DLTIV   $0.200/GB ( $16   / 80 GB ) 3-5 year
HDD $0.424/GB ( $318  / 750 GB )3-5 year
BD-R$0.640/GB ( $16   / 25 GB ) 100 year?
CD-R$0.771/GB ( $0.54 / 0.7 GB )100 year?
8mm $1.000/GB ( $2.50 / 2.5 GB )3-5 year
ZIP $16.00/GB ( $4/ 0.25 GB )   3-5 month
floppy  $90.28/GB ( $0.13 / 1.44 MB )   3-5 min
clay$2700./GB ( $.3/lb 1 bit/mm^3) >30,000 y

Note that hard disk drives cost 10x more than DVD-R. Blu-Ray disks 
(BD-R) are more expensive than hard drives! Storing all this data on 
hard drives would cost ~$50k, with an additional $5k/year for the 2 
GB/hr we routinely collect.  Storing all the data from all 30 PX 
beamlines in the world would amount to ~$150k/year of hard drives.


The ~100 year lifetime of optical "-R" media has, of course yet to be 
proved historically.  Early CDs did have a problem that the glue used 
for the label was slightly acid and corroded the aluminum reflective 
layer that encodes the data (which, BTW, is directly under the label!).  
Modern media no longer have this problem, and there are watchdog 
agencies you can find with google that simulate the long-term effects of 
time on any media you like.  I can certainly say I have had problems 
with worn-out DVD-R drives starting to make bad disks that pass 
verification in the writer but not in a low-end reader.  It seems that