On 23/08/2011 11:29, Noel O'Boyle wrote: > I note that Chris has fixed some problems with metals, and we're down > to 146 failures.
Some notes on this are below. (I was waiting to make sure the changes didn't cause mass test failure.) On 09/08/2011 13:39, Róbert Kiss wrote: > We recently did some testing with different cheminformatic tools > including OpenBabel to see how accurately they can read and write SD > files. Noel has addressed the stereochemical discrepancies previously. Some of the other ones, mainly valence-related, are now fixed with the changes discussed in more detail below. > 3. For atoms with unusual valence state it seems that OpenBabel > automatically sets the typical valence, while InChI accepts the > valence as described in the SDF (e.g. 44420892). This difference in > the valence state results in a difference in the number of implicit > hydrogens (connected to Si in this example), and thus in different > InChIs: > > input InChI: InChI=1S/C27H35NO8Si/c1-27(2,3)37-36-21-14-18(28-29)24(17-12-22(32-6)26(34-8)23(13-17)33-7)25(21)16-9-10-19(31-5)20(11-16)35-15-30-4/h9-13,21,29H,14-15H2,1-8H3/b28-18- > (InChI warning: value="Accepted unusual valence(s): Si(2)") > output InChI: InChI=1S/C27H37NO8Si/c1-27(2,3)37-36-21-14-18(28-29)24(17-12-22(32-6)26(34-8)23(13-17)33-7)25(21)16-9-10-19(31-5)20(11-16)35-15-30-4/h9-13,21,29H,14-15,37H2,1-8H3/b28-18- > > input SDF: 5.2791 3.0717 0.0000 Si 0 0 0 0 0 2 0 0 0 0 > 0 0 (valence is 2) > output SDF: 5.2791 3.0717 0.0000 Si 0 0 0 0 0 0 0 0 0 > 0 0 0 (valence is set to default; in case of Si this means 4) OpenBabel currently interprets only the value 15 (= 0 valence) (because of a technical difficulty) and I have now made it so that any value causes no implicit hydrogens. I expect that this is nearly always why this feature is used, but there could be cases within the spec (such as it is) which would not necessarily be correctly interpreted. However, the sd file in this example is now read correctly. But when output it uses the M RAD line, rather than the valence value, which IMO gives a better chemical description of the molecule. > 4. For atoms with unusual valence state sometimes the valence state in > the atom block disappears and an extra "M RAD" line appears in the > output SDF (e.g.: 19350442). AFAIK the valence count in the atom block > and the "M RAD" line are two different things (not totally > independent though) so the valence information cannot be converted to > a radical state information directly. Also the last number in the "M > RAD" line can only be 0,1,2 or 3 according to the MOL file > specification, while we found numbers 4 and 5 in some cases. > input InChI: InChI=1S/C9H13.C8H11.2CH3.2ClH.Si.Zr/c1-6-5-7(2)9(4)8(6)3;1-2-4-6-8-7-5-3-1;;;;;;/h6H,1-4H3;1-3H,4,6-8H2;2*1H3;2*1H;;/q4*-1;;;;+4/p-2/b;2-1-;;;;;; > output InChI: InChI=1S/C9H13.C8H11.2CH3.2ClH.H4Si.Zr/c1-6-5-7(2)9(4)8(6)3;1-2-4-6-8-7-5-3-1;;;;;;/h6H,1-4H3;1-3H,4,6-8H2;2*1H3;2*1H;1H4;/q4*-1;;;;+4/p-2/b;2-1-;;;;;; > > input SDF atom block: 9.9774 5.0246 0.0000 Si 0 0 0 0 0 15 > 0 0 0 0 0 0 (15 means valence: 0) > output SDF atom block: 9.9774 5.0246 0.0000 Si 0 0 0 0 0 0 > 0 0 0 0 0 0 (0 means valence is default: 4) > > input SDF: no "M RAD" line > output SDF: M RAD 1 4 5 OpenBabel uses the equivalent of the RAD value to represent hydrogen deficiency in the organic subset of elements and silicon, so it is necessary to use the values 4 and 5 internally. Isolated C or Si atoms would have a value of 5, even if their spinmultiplicity was smaller. The molecule 19350442 has such an unbonded Si atom (which does not seem very realistic to me, and illustrates the inadequacy of SDF for organometallic molecules). But in the writing of MDL files the RAD values 4 and 5 are now replaced by a valence value as are any values on metal atoms. > 5. This is a quite extreme molecule (23569471). It contains a carbon > atom connected to another carbon and two hydrogens. It has a positive > charge according to the input SDF. OpenBabel preserves the charge > information, but adds an additional "M RAD" line, which is (together > with the positive charge) not correct, I think. This difference in the > SDF results in different InChIs because InChI can only remove the > positive charge from the PubChem input SDF. > > input InChI: InChI=1S/C29H42O2/c1-20(2)12-9-13-21(3)14-10-15-22(4)16-11-18-29(8)19-17-26-25(7)27(30)23(5)24(6)28(26)31-29/h12,14,16H,1,9-11,13,15,17-19H2,2-8H3/p+1/b20-12-,21-14+,22-16+ > output InChI: InChI=1S/C29H43O2/c1-20(2)12-9-13-21(3)14-10-15-22(4)16-11-18-29(8)19-17-26-25(7)27(30)23(5)24(6)28(26)31-29/h12,14,16,30H,1,9-11,13,15,17-19H2,2-8H3/q+1/b21-14+,22-16+ > > input SDF: no "M RAD" line > output SDF: M RAD 1 31 2 There was a missing valence value in a data file for such carbanions, which is now corrected. ------------------------------------------------------------------------------ Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss