On 2014-08-22 10:38, Michał Nowotka wrote:
A question I have is why you want to access the bond wedging.
[...] Now imagine I only have this molfile and I want to convert it back to
*mrv. I don't want to write my own parser for molfiles when I know
that RDKit can already parse it. But I need to extract this 'bond
stereo' information from within RDKit somehow.

Now when you say that this '1' or 'W' value corresponds to bond
direction, I'm guessing that 'direction' can store only two values: up
and down so '1' and '6' ('W' and 'H' in marvin terms). So what about
other values which this field can have, If for example I have this
molfile:



  10 10  0  0  0  0  0  0  0  0999 V2000
    -1.6741   -0.2687    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    -2.3885   -0.6812    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    -2.3885   -1.5063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    -1.6741   -1.9188    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    -0.9596   -1.5063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    -0.9596   -0.6812    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    -0.2451   -0.2686    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    -0.2451    0.5563    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
     0.4692   -0.6811    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
     0.4692   -1.5061    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   1  2  2  0  0  0  0
   2  3  1  0  0  0  0
   3  4  2  0  0  0  0
   4  5  1  0  0  0  0
   5  6  2  0  0  0  0
   6  1  1  0  0  0  0
   6  7  1  0  0  0  0
   7  9  1  0  0  0  0
   9 10  1  0  0  0  0
   7  8  1  4  0  0  0
M  END

So 4 instead of 1, how I will get this information from RDKit?



Hi Michal,

I took a look at the C++ code in GraphMol/FileParsers/MolFileParser.cpp.

ParseMolFileBondLine() for parsing V2000 molfiles sets the BondDir to UNKNOWN (case 4, bond stereo type = 4):

          stereo = FileParserUtils::toInt(text.substr(9,3));
          switch(stereo){
          case 0:
            res->setBondDir(Bond::NONE);
            break;
          case 1:
            res->setBondDir(Bond::BEGINWEDGE);
            break;
          case 6:
            res->setBondDir(Bond::BEGINDASH);
            break;
          case 3: // "either" double bond
            res->setBondDir(Bond::EITHERDOUBLE);
        res->setStereo(Bond::STEREOANY);
        break;
          case 4: // "either" single bond
            res->setBondDir(Bond::UNKNOWN);
            break;
          }

In ParseV3000BondBlock() for V3000 molfiles the same thing happens, so they agree (case 2, CFG=2, bond type = single (1)):

          if(prop=="CFG"){
            unsigned int cfg=atoi(val.c_str());
            switch(cfg){
            case 0: break;
            case 1:
              bond->setBondDir(Bond::BEGINWEDGE);
          chiralityPossible=true;
              break;
            case 2:
              if(bType==1) bond->setBondDir(Bond::UNKNOWN);
              else if(bType==2){
        bond->setBondDir(Bond::EITHERDOUBLE);
        bond->setStereo(Bond::STEREOANY);
          }
              break;
            case 3:
              bond->setBondDir(Bond::BEGINDASH);
          chiralityPossible=true;
              break;
            default:
              errout << "bad bond CFG "<<val<<"' on line "<<line;
              throw FileParseException(errout.str()) ;
            }
          } else if(prop=="TOPO"){

The bonds will therefore be assigned a BondDir value of Bond::UNKNOWN for single either bonds and BOND::EITHERDOUBLE for double either bonds.

I read in a V2000 molfile where the second bond is a single either bond (stereo bond type of 4) and the third bond is a double either bond (stereo bond type of 3).

    >>> from rdkit import Chem
    >>> m = Chem.MolFromMolFile("C:/temp/either.mol", sanitize=False,
   removeHs=False)
    >>> for b in m.GetBonds(): print b.GetBondDir()
   ...
   NONE
   UNKNOWN
   5
   NONE
   NONE
   NONE
    >>>


Only slight surprise is that Python returns a "5" instead of an "EITHERDOUBLE" string.

>>> Chem.rdchem.BondDir.values
{0: rdkit.Chem.rdchem.BondDir.NONE, 1: rdkit.Chem.rdchem.BondDir.BEGINWEDGE, 2: rdkit.Chem.rdchem.BondDir.BEGINDASH, 3: rdkit.Chem.rdchem.BondDir.ENDDOWNRI GHT, 4: rdkit.Chem.rdchem.BondDir.ENDUPRIGHT, 6: rdkit.Chem.rdchem.BondDir.UNKNOWN}
>>>

For some reason Python does not map the BondDir value 5 to a name. But the value does match EITHERDOUBLE's implicit ordinal value defined in GraphMol/Bond.h, so it matches what I expect from reading the parser code:

    //! the bond's direction (for chirality)
    typedef enum {
      NONE=0,         //!< no special style
      BEGINWEDGE,     //!< wedged: narrow at begin
      BEGINDASH,      //!< dashed: narrow at begin
      // FIX: this may not really be adequate
      ENDDOWNRIGHT,   //!< for cis/trans
      ENDUPRIGHT,     //!<  ditto
      EITHERDOUBLE,   //!< a "crossed" double bond
      UNKNOWN,        //!< intentionally unspecified stereochemistry
    } BondDir;

So the information is retained in GetBondDir() as long as you don't sanitize.

Cheers
-- Jan
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to