Earlier this year for the first time I got back a validation report from the PDB for a deposited structure that included wwPDB validation of a ligand. This is great stuff. I approve. I am happy.
Unfortunately the validation check reported problems with my ligand. This is bad. I am unhappy. What went wrong? A long story follows. Skip to the end for the TL;DNR summary. Basically I am advocating to treat errors, omissions, or inadequacies in the CCP4 ligand dictionaries as bugs in exactly the same sense as program bugs. Report them when you find them, get them fixed in CCP4 updates, and down the road we will all have better structures. ==== Long version ==== Since last year I have been happily using the integrated tools Coot, Jligand, cprodrg, and refmac5 to sketch a ligand, generate a dictionary, fit to initial difference density, and refine. In the absence of an independent validation check, I thought everything was working acceptably. The bad grade on my wwPDB validation report [pun intended] made me look into the guts of this tool chain more carefully trying to see what went wrong. In short here is what happens: - coot fires up jligand - I sketch the compound and click "accept" - jligand creates a file prodrg-in.mdl that contains only atom type, connectivity, single/double bond flag - cprodrg takes this and assigns each atom a more complete chemical label, for example O 15.9994 CARBONYL OXYGEN (C=O) CH2 12.011 ALIPHATIC CH2-GROUP NR 14.0067 AROMATIC NITROGEN - cprodrg then categorizes each bond by the assigned types of the two bonded atoms, and similarly categorizes each bond angle by the assigned types of its three constituent atoms. So far so good. Now comes the problematic part. - cprodrg tries to find a target geometry (ideal bond length or angle) for each category by matching against the contents of the file .../Prodrg/param/ff/default If an exact match is not found, it falls through to ... well I'm not sure exactly what the rule is for falling through. This is the part that goes wrong. The content of this default parameter file is rather impoverished. My ligand contained a pyrazole (5-membered ring with 2 adjacent nitrogens). The nitrogens were assigned a category NR5 14.0067 NITROGEN (5-RING) But the default file contains no bond or angle entries for this atom type, so it "falls through" to the only N-N bond it does contain NSP - NSP target length 1.12Å That's miles off, or at any rate more than 1/3 Å off, the expected length of 1.396Å tabulated in the Mogul database for a pyrazole. (The wwPDB report listed a target of 1.37Å). I don't expect perfection, but target errors of more than 0.3Å in bond length are large compared to the expected accuracy of even a modest resolution protein structure. No wonder the wwPDB validation report flagged it as a 13 sigma outlier in the refined structure. ==== TL;DNR version ==== The $CCP4/share/prodrg/prodrg.param file does not contain target values for many bond types that are correctly identified by prodrg itself. Adding a single line handling NR5-NR5 bonds to the source file ccp4-6.4.0/src/Prodrg/param/ff/default yields a significant improvement in my refined protein structure. Even the R/Rfree are improved, which surprised me. These were identical runs except for the regenerated ligand dictionary. Would it be appropriate to report this as a bug? I think so. Where should I report it?