I like your IMGATM proposal, but wouldn't it also potentially break some of the programs? Also--and this is a problem with deleting only sidechain atoms in general--it seems that many, myself included, might totally miss that an apparent "alanine" is really a trunco-lysine. What I like is that it does get around the problem of people over-interpreting bogus sidechains, but it falls short, perhaps, in misleading people about what residue is there. I, for one, would not feel that I had to click on all the alanines in a model to verify that they were not lysines, and would be surprised and puzzled for a while about why this ala said lys when I clicked on it. Wouldn't you be surprised? (Well, maybe not after this thread...)
JPK On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud <det...@uoxray.uoregon.edu> wrote: > The definition of _atom_site.occupancy is > > The fraction of the atom type present at this site. > The sum of the occupancies of all the atom types at this site > may not significantly exceed 1.0 unless it is a dummy site. > > When an atom has an occupancy equal to zero that means that the > atom is NEVER present at that site - and that is not what you > intend to say. Setting the occupancy to zero does not mean that > a full atom is located somewhere in this area. Quite the opposite. > > (The reference to a dummy site is interesting and implies to > me that mmCIF already has the mechanism you wish for.) > > Having some experience with refining low occupancy atoms and > working with dummy marker atoms I'm quite confident that you can > never define a B factor cutoff that would work. No matter what > value you choose you will find some atoms in density that refine > to values greater than the cutoff, or the limit you choose is so > high that you will find marker atoms that refine to less than the > limit. A B factor cutoff cannot work - no matter the value you > choose you will always be plagued with false positives or false > negatives. > > If you really want to stuff this bit into one of these fields > you have to go all out. Set the occupancy of a marker atom to -99.99. > This will unambiguously mark the atom as an imaginary one. This > will, of course, break every program that reads PDB format files, > but that is what should happen in any case. If you change the > definition of the columns in the file you must mandate that all > programs be upgraded to recognized the new definitions. I don't > know how you can do that other than ensuring that the change will > cause programs to cough. To try to slide it by with a magic value > that will be silently accepted by existing programs is to beg for > bugs and subtle side-effects. > > Good luck getting the maintainers of the mmCIF standard to accept > a magic value in either of these fields. > > How about this: We already have the keywords ATOM and HETATM > (and don't ask me why we have two). How about we create a new > record in the PDB format, say IMGATM, that would have all the > fields of an ATOM record but would be recognized as whatever the > marker is for "dummy" atoms in the current mmCIF? Existing programs > would completely ignore these atoms, as they should until they are > modified to do something reasonable with them. Those of us who > have no use for them can either use a switch in the program to > ignore them or just grep them out of the file. Someone could write > a program that would take a model with only ATOM and HETATM records > and fill out all the desired IMGATM records (Let's call that program > WASNIAHC, everyone would remember that!). > > This solution is unambiguous. It can be represented in current > mmCIF, I think. The PDB could run WASNIAHC themselves after deposition > but before acceptance by the depositor so people like me would not > have to deal with them during refinement but would be able to see > them before our precious works of art are unleashed on the world. > > Seems like a win-win solution to me. > > Dale Tronrud > > > On 4/3/2011 9:17 PM, Jacob Keller wrote: >> >> Well, what about getting the default settings on the major molecular >> viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")? >> While the b cutoff is still be tricky, I assume we could eventually >> come to consensus on some reasonable cutoff (2 sigma from the mean?), >> and then this approach would allow each free-spirited crystallographer >> to keep his own preferred method of dealing with these troublesome >> sidechains and nary a novice would be led astray.... >> >> JPK >> >> On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett<er...@pobox.com> wrote: >>> >>> Most non-structural users are familiar with the sequence of the proteins >>> they are studying, and most software does at least display residue identity >>> if you select an atom in a residue, so usually it is not necessary to do any >>> cross checking besides selecting an atom in the residue and seeing what its >>> residue name is. The chance of somebody misinterpreting a truncated Lys as >>> Ala is, in my experience, much much lower than the chance they will trust >>> the xyz coordinates of atoms with zero occupancy or high B factors. >>> >>> What worries me the most is somebody designing a whole biological >>> experiment around an over-interpretation of details that are implied by xyz >>> coordinates of atoms, even if those atoms were not resolved in the maps. >>> When this sort of error occurs it is a level of pain and wasted effort that >>> makes the "pain" associated with having to build back in missing side chains >>> look completely trivial. >>> >>> As long as the PDB file format is the way users get structural data, >>> there is really no good way to communicate "atom exists with no reliable >>> coordinates" to the user, given the diversity of software packages out there >>> for reading PDB files and the historical lack of any standard way of dealing >>> with this issue. Even if the file format is hacked there is no way to force >>> all the existing software out there to understand the hack. A file format >>> that isn't designed with this sort of feature from day one is not going to >>> be fixable as a practical matter after so much legacy code has accumulated. >>> >>> -Eric >>> >>> >>> >>> On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote: >>> >>>> To the delete-the-atom-nik's: do you propose deleting the whole >>>> residue or just the side chain? I can understand deleting the whole >>>> residue, but deleting only the side chain seems to me to be placing a >>>> stumbling block also, and even possibly confusing for an experienced >>>> crystallographer: the .pdb says "lys" but it looks like an ala? Which >>>> is it? I could imagine a lot of frustration-hours arising from this >>>> practice, with people cross-checking sequences, looking in the methods >>>> sections for mutations... >>>> >>>> JPK >>>> >>> >> >> >> > -- ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu *******************************************