Hi Jacob, The PDB header has a record for missing atoms. Coot has an option to find them and any decent validation software will warn about incomplete residues. There are PDBREPORT entries for every PDB file with a list of incomplete residues. If a user makes a very small effort, he doesn't have to go around clicking every 'alanine'.
Cheers, Robbie > Date: Mon, 4 Apr 2011 16:15:58 -0500 > From: j-kell...@fsm.northwestern.edu > Subject: Re: [ccp4bb] what to do with disordered side chains > To: CCP4BB@JISCMAIL.AC.UK > > I like your IMGATM proposal, but wouldn't it also potentially break > some of the programs? Also--and this is a problem with deleting only > sidechain atoms in general--it seems that many, myself included, might > totally miss that an apparent "alanine" is really a trunco-lysine. > What I like is that it does get around the problem of people > over-interpreting bogus sidechains, but it falls short, perhaps, in > misleading people about what residue is there. I, for one, would not > feel that I had to click on all the alanines in a model to verify that > they were not lysines, and would be surprised and puzzled for a while > about why this ala said lys when I clicked on it. Wouldn't you be > surprised? (Well, maybe not after this thread...) > > JPK > > > > On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud <det...@uoxray.uoregon.edu> > wrote: > > The definition of _atom_site.occupancy is > > > > The fraction of the atom type present at this site. > > The sum of the occupancies of all the atom types at this site > > may not significantly exceed 1.0 unless it is a dummy site. > > > > When an atom has an occupancy equal to zero that means that the > > atom is NEVER present at that site - and that is not what you > > intend to say. Setting the occupancy to zero does not mean that > > a full atom is located somewhere in this area. Quite the opposite. > > > > (The reference to a dummy site is interesting and implies to > > me that mmCIF already has the mechanism you wish for.) > > > > Having some experience with refining low occupancy atoms and > > working with dummy marker atoms I'm quite confident that you can > > never define a B factor cutoff that would work. No matter what > > value you choose you will find some atoms in density that refine > > to values greater than the cutoff, or the limit you choose is so > > high that you will find marker atoms that refine to less than the > > limit. A B factor cutoff cannot work - no matter the value you > > choose you will always be plagued with false positives or false > > negatives. > > > > If you really want to stuff this bit into one of these fields > > you have to go all out. Set the occupancy of a marker atom to -99.99. > > This will unambiguously mark the atom as an imaginary one. This > > will, of course, break every program that reads PDB format files, > > but that is what should happen in any case. If you change the > > definition of the columns in the file you must mandate that all > > programs be upgraded to recognized the new definitions. I don't > > know how you can do that other than ensuring that the change will > > cause programs to cough. To try to slide it by with a magic value > > that will be silently accepted by existing programs is to beg for > > bugs and subtle side-effects. > > > > Good luck getting the maintainers of the mmCIF standard to accept > > a magic value in either of these fields. > > > > How about this: We already have the keywords ATOM and HETATM > > (and don't ask me why we have two). How about we create a new > > record in the PDB format, say IMGATM, that would have all the > > fields of an ATOM record but would be recognized as whatever the > > marker is for "dummy" atoms in the current mmCIF? Existing programs > > would completely ignore these atoms, as they should until they are > > modified to do something reasonable with them. Those of us who > > have no use for them can either use a switch in the program to > > ignore them or just grep them out of the file. Someone could write > > a program that would take a model with only ATOM and HETATM records > > and fill out all the desired IMGATM records (Let's call that program > > WASNIAHC, everyone would remember that!). > > > > This solution is unambiguous. It can be represented in current > > mmCIF, I think. The PDB could run WASNIAHC themselves after deposition > > but before acceptance by the depositor so people like me would not > > have to deal with them during refinement but would be able to see > > them before our precious works of art are unleashed on the world. > > > > Seems like a win-win solution to me. > > > > Dale Tronrud > > > > > > On 4/3/2011 9:17 PM, Jacob Keller wrote: > >> > >> Well, what about getting the default settings on the major molecular > >> viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")? > >> While the b cutoff is still be tricky, I assume we could eventually > >> come to consensus on some reasonable cutoff (2 sigma from the mean?), > >> and then this approach would allow each free-spirited crystallographer > >> to keep his own preferred method of dealing with these troublesome > >> sidechains and nary a novice would be led astray.... > >> > >> JPK > >> > >> On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett<er...@pobox.com> wrote: > >>> > >>> Most non-structural users are familiar with the sequence of the proteins > >>> they are studying, and most software does at least display residue > >>> identity > >>> if you select an atom in a residue, so usually it is not necessary to do > >>> any > >>> cross checking besides selecting an atom in the residue and seeing what > >>> its > >>> residue name is. The chance of somebody misinterpreting a truncated Lys > >>> as > >>> Ala is, in my experience, much much lower than the chance they will trust > >>> the xyz coordinates of atoms with zero occupancy or high B factors. > >>> > >>> What worries me the most is somebody designing a whole biological > >>> experiment around an over-interpretation of details that are implied by > >>> xyz > >>> coordinates of atoms, even if those atoms were not resolved in the maps. > >>> When this sort of error occurs it is a level of pain and wasted effort > >>> that > >>> makes the "pain" associated with having to build back in missing side > >>> chains > >>> look completely trivial. > >>> > >>> As long as the PDB file format is the way users get structural data, > >>> there is really no good way to communicate "atom exists with no reliable > >>> coordinates" to the user, given the diversity of software packages out > >>> there > >>> for reading PDB files and the historical lack of any standard way of > >>> dealing > >>> with this issue. Even if the file format is hacked there is no way to > >>> force > >>> all the existing software out there to understand the hack. A file format > >>> that isn't designed with this sort of feature from day one is not going to > >>> be fixable as a practical matter after so much legacy code has > >>> accumulated. > >>> > >>> -Eric > >>> > >>> > >>> > >>> On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote: > >>> > >>>> To the delete-the-atom-nik's: do you propose deleting the whole > >>>> residue or just the side chain? I can understand deleting the whole > >>>> residue, but deleting only the side chain seems to me to be placing a > >>>> stumbling block also, and even possibly confusing for an experienced > >>>> crystallographer: the .pdb says "lys" but it looks like an ala? Which > >>>> is it? I could imagine a lot of frustration-hours arising from this > >>>> practice, with people cross-checking sequences, looking in the methods > >>>> sections for mutations... > >>>> > >>>> JPK > >>>> > >>> > >> > >> > >> > > > > > > -- > ******************************************* > Jacob Pearson Keller > Northwestern University > Medical Scientist Training Program > cel: 773.608.9185 > email: j-kell...@northwestern.edu > *******************************************