Hi Jan, Yes, thanks to Greg's hints I've implemented this https://github.com/mnowotka/chembl_beaker/blob/master/chembl_beaker/beaker/core_apps/marvin/MarvinJSONEncoder.py#L292. Anyway, thank you for finding actual code, it was definitely worth taking a look, the whole parser implementation is interesting.
Cheers, Michał On Fri, Aug 22, 2014 at 7:36 PM, Jan Holst Jensen <[email protected]> wrote: > On 2014-08-22 10:38, Michał Nowotka wrote: > > A question I have is why you want to access the bond wedging. > > [...] Now imagine I only have this molfile and I want to convert it back to > *mrv. I don't want to write my own parser for molfiles when I know > that RDKit can already parse it. But I need to extract this 'bond > stereo' information from within RDKit somehow. > > Now when you say that this '1' or 'W' value corresponds to bond > direction, I'm guessing that 'direction' can store only two values: up > and down so '1' and '6' ('W' and 'H' in marvin terms). So what about > other values which this field can have, If for example I have this > molfile: > > > > 10 10 0 0 0 0 0 0 0 0999 V2000 > -1.6741 -0.2687 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -2.3885 -0.6812 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -2.3885 -1.5063 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -1.6741 -1.9188 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -0.9596 -1.5063 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -0.9596 -0.6812 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -0.2451 -0.2686 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -0.2451 0.5563 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 > 0.4692 -0.6811 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.4692 -1.5061 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 2 0 0 0 0 > 2 3 1 0 0 0 0 > 3 4 2 0 0 0 0 > 4 5 1 0 0 0 0 > 5 6 2 0 0 0 0 > 6 1 1 0 0 0 0 > 6 7 1 0 0 0 0 > 7 9 1 0 0 0 0 > 9 10 1 0 0 0 0 > 7 8 1 4 0 0 0 > M END > > So 4 instead of 1, how I will get this information from RDKit? > > > > Hi Michal, > > I took a look at the C++ code in GraphMol/FileParsers/MolFileParser.cpp. > > ParseMolFileBondLine() for parsing V2000 molfiles sets the BondDir to > UNKNOWN (case 4, bond stereo type = 4): > > stereo = FileParserUtils::toInt(text.substr(9,3)); > switch(stereo){ > case 0: > res->setBondDir(Bond::NONE); > break; > case 1: > res->setBondDir(Bond::BEGINWEDGE); > break; > case 6: > res->setBondDir(Bond::BEGINDASH); > break; > case 3: // "either" double bond > res->setBondDir(Bond::EITHERDOUBLE); > res->setStereo(Bond::STEREOANY); > break; > case 4: // "either" single bond > res->setBondDir(Bond::UNKNOWN); > break; > } > > In ParseV3000BondBlock() for V3000 molfiles the same thing happens, so they > agree (case 2, CFG=2, bond type = single (1)): > > if(prop=="CFG"){ > unsigned int cfg=atoi(val.c_str()); > switch(cfg){ > case 0: break; > case 1: > bond->setBondDir(Bond::BEGINWEDGE); > chiralityPossible=true; > break; > case 2: > if(bType==1) bond->setBondDir(Bond::UNKNOWN); > else if(bType==2){ > bond->setBondDir(Bond::EITHERDOUBLE); > bond->setStereo(Bond::STEREOANY); > } > break; > case 3: > bond->setBondDir(Bond::BEGINDASH); > chiralityPossible=true; > break; > default: > errout << "bad bond CFG "<<val<<"' on line "<<line; > throw FileParseException(errout.str()) ; > } > } else if(prop=="TOPO"){ > > The bonds will therefore be assigned a BondDir value of Bond::UNKNOWN for > single either bonds and BOND::EITHERDOUBLE for double either bonds. > > I read in a V2000 molfile where the second bond is a single either bond > (stereo bond type of 4) and the third bond is a double either bond (stereo > bond type of 3). > >>>> from rdkit import Chem >>>> m = Chem.MolFromMolFile("C:/temp/either.mol", sanitize=False, >>>> removeHs=False) >>>> for b in m.GetBonds(): print b.GetBondDir() > ... > NONE > UNKNOWN > 5 > NONE > NONE > NONE >>>> > > > Only slight surprise is that Python returns a "5" instead of an > "EITHERDOUBLE" string. > >>>> Chem.rdchem.BondDir.values > {0: rdkit.Chem.rdchem.BondDir.NONE, 1: rdkit.Chem.rdchem.BondDir.BEGINWEDGE, > 2: rdkit.Chem.rdchem.BondDir.BEGINDASH, 3: > rdkit.Chem.rdchem.BondDir.ENDDOWNRI > GHT, 4: rdkit.Chem.rdchem.BondDir.ENDUPRIGHT, 6: > rdkit.Chem.rdchem.BondDir.UNKNOWN} >>>> > > For some reason Python does not map the BondDir value 5 to a name. But the > value does match EITHERDOUBLE's implicit ordinal value defined in > GraphMol/Bond.h, so it matches what I expect from reading the parser code: > > //! the bond's direction (for chirality) > typedef enum { > NONE=0, //!< no special style > BEGINWEDGE, //!< wedged: narrow at begin > BEGINDASH, //!< dashed: narrow at begin > // FIX: this may not really be adequate > ENDDOWNRIGHT, //!< for cis/trans > ENDUPRIGHT, //!< ditto > EITHERDOUBLE, //!< a "crossed" double bond > UNKNOWN, //!< intentionally unspecified stereochemistry > } BondDir; > > So the information is retained in GetBondDir() as long as you don't > sanitize. > > Cheers > -- Jan ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

