Hi Chris, Thanks for the corrections! I added some notes below.
>-----Original Message----- >From: Chris Morley [mailto:c.mor...@gaseq.co.uk] >Sent: Tuesday, August 23, 2011 12:51 PM >To: openbabel-discuss@lists.sourceforge.net >Subject: Re: [Open Babel] SDF conversion test results > >On 23/08/2011 11:29, Noel O'Boyle wrote: >> I note that Chris has fixed some problems with metals, and we're down >> to 146 failures. > >Some notes on this are below. (I was waiting to make sure the changes >didn't cause mass test failure.) > >On 09/08/2011 13:39, Róbert Kiss wrote: > > We recently did some testing with different cheminformatic tools > > including OpenBabel to see how accurately they can read and write SD > > files. > >Noel has addressed the stereochemical discrepancies previously. Some of >the other ones, mainly valence-related, are now fixed with the changes >discussed in more detail below. That's great, thanks! > > > 3. For atoms with unusual valence state it seems that OpenBabel > > automatically sets the typical valence, while InChI accepts the > > valence as described in the SDF (e.g. 44420892). This difference in > > the valence state results in a difference in the number of implicit > > hydrogens (connected to Si in this example), and thus in different > > InChIs: > > > > input InChI: >InChI=1S/C27H35NO8Si/c1-27(2,3)37-36-21-14-18(28-29)24(17-12-22(32-6)26(34- >8)23(13-17)33-7)25(21)16-9-10-19(31-5)20(11-16)35-15-30-4/h9-13,21,29H,14- >15H2,1-8H3/b28-18- > > (InChI warning: value="Accepted unusual valence(s): Si(2)") > > output InChI: >InChI=1S/C27H37NO8Si/c1-27(2,3)37-36-21-14-18(28-29)24(17-12-22(32-6)26(34- >8)23(13-17)33-7)25(21)16-9-10-19(31-5)20(11-16)35-15-30-4/h9-13,21,29H,14- >15,37H2,1-8H3/b28-18- > > > > input SDF: 5.2791 3.0717 0.0000 Si 0 0 0 0 0 2 0 0 0 0 > > 0 0 (valence is 2) > > output SDF: 5.2791 3.0717 0.0000 Si 0 0 0 0 0 0 0 0 0 > > 0 0 0 (valence is set to default; in case of Si this means 4) > >OpenBabel currently interprets only the value 15 (= 0 valence) (because >of a technical difficulty) and I have now made it so that any value >causes no implicit hydrogens. According to the mol file spec this column "shows number of bonds to this atom, including bonds to implied Hs". I might have misunderstood this (which is quite possible) but I think it doesn't say no implicit Hs for this atom. For example an atom with a value of 2 in this column and with only one explicit single bond in the bond block should still be expected to have one implicit hydrogen. Right? >I expect that this is nearly always why >this feature is used, but there could be cases within the spec (such as >it is) which would not necessarily be correctly interpreted. However, >the sd file in this example is now read correctly. But when output it >uses the M RAD line, rather than the valence value, which IMO gives a >better chemical description of the molecule. I agree the "M RAD" line gives a better description in several cases (and this is probably why it overrides the atom block valence value when exists). On the other hand, I'm not sure the "M RAD" line can be used to represent CH with 3 free electrons or a single Si atom with four. > > 4. For atoms with unusual valence state sometimes the valence state in > > the atom block disappears and an extra "M RAD" line appears in the > > output SDF (e.g.: 19350442). AFAIK the valence count in the atom block > > and the "M RAD" line are two different things (not totally > > independent though) so the valence information cannot be converted to > > a radical state information directly. Also the last number in the "M > > RAD" line can only be 0,1,2 or 3 according to the MOL file > > specification, while we found numbers 4 and 5 in some cases. > > input InChI: >InChI=1S/C9H13.C8H11.2CH3.2ClH.Si.Zr/c1-6-5-7(2)9(4)8(6)3;1-2-4-6-8-7-5-3- >1;;;;;;/h6H,1-4H3;1-3H,4,6-8H2;2*1H3;2*1H;;/q4*-1;;;;+4/p-2/b;2-1-;;;;;; > > output InChI: >InChI=1S/C9H13.C8H11.2CH3.2ClH.H4Si.Zr/c1-6-5-7(2)9(4)8(6)3;1-2-4-6-8-7-5- >3-1;;;;;;/h6H,1-4H3;1-3H,4,6-8H2;2*1H3;2*1H;1H4;/q4*-1;;;;+4/p-2/b;2-1- >;;;;;; > > > > input SDF atom block: 9.9774 5.0246 0.0000 Si 0 0 0 0 0 15 > > 0 0 0 0 0 0 (15 means valence: 0) > > output SDF atom block: 9.9774 5.0246 0.0000 Si 0 0 0 0 0 0 > > 0 0 0 0 0 0 (0 means valence is default: 4) > > > > input SDF: no "M RAD" line > > output SDF: M RAD 1 4 5 > >OpenBabel uses the equivalent of the RAD value to represent hydrogen >deficiency in the organic subset of elements and silicon, so it is >necessary to use the values 4 and 5 internally. Isolated C or Si atoms >would have a value of 5, even if their spinmultiplicity was smaller. The >molecule 19350442 has such an unbonded Si atom (which does not seem very >realistic to me, and illustrates the inadequacy of SDF for >organometallic molecules). But in the writing of MDL files the RAD >values 4 and 5 are now replaced by a valence value as are any values on >metal atoms. > While again I totally agree with the inadequacy of molfiles for non-bonded organometallics, I think it is a bit dangerous to output values 4 and 5 in the "M RAD" line, as tools strictly following the sdf spec (as InChI does) will not be able to parse them, so I agree to indicate this information in the valence column where necessary. As for the unbound Si I would argue that most free radicals fall into the "not realistic" category, but their representation can be still useful, e.g. for chemists representing a specific transition state. But, agreed again, these are always suspicious usually suggesting a drawing error, rather than a truly meant free radical. > > 5. This is a quite extreme molecule (23569471). It contains a carbon > > atom connected to another carbon and two hydrogens. It has a positive > > charge according to the input SDF. OpenBabel preserves the charge > > information, but adds an additional "M RAD" line, which is (together > > with the positive charge) not correct, I think. This difference in the > > SDF results in different InChIs because InChI can only remove the > > positive charge from the PubChem input SDF. > > > > input InChI: >InChI=1S/C29H42O2/c1-20(2)12-9-13-21(3)14-10-15-22(4)16-11-18-29(8)19-17- >26-25(7)27(30)23(5)24(6)28(26)31-29/h12,14,16H,1,9-11,13,15,17-19H2,2- >8H3/p+1/b20-12-,21-14+,22-16+ > > output InChI: >InChI=1S/C29H43O2/c1-20(2)12-9-13-21(3)14-10-15-22(4)16-11-18-29(8)19-17- >26-25(7)27(30)23(5)24(6)28(26)31-29/h12,14,16,30H,1,9-11,13,15,17-19H2,2- >8H3/q+1/b21-14+,22-16+ > > > > input SDF: no "M RAD" line > > output SDF: M RAD 1 31 2 > >There was a missing valence value in a data file for such carbanions, >which is now corrected. > Excellent! Cheers for that! Regards, Robert -- Robert Kiss http://mcule.com ------------------------------------------------------------------------------ EMC VNX: the world's simplest storage, starting under $10K The only unified storage solution that offers unified management Up to 160% more powerful than alternatives and 25% more efficient. Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss