Hi Jan,

Yes, thanks to Greg's hints I've implemented this
https://github.com/mnowotka/chembl_beaker/blob/master/chembl_beaker/beaker/core_apps/marvin/MarvinJSONEncoder.py#L292.
Anyway, thank you for finding actual code, it was definitely worth
taking a look, the whole parser implementation is interesting.

Cheers,
Michał

On Fri, Aug 22, 2014 at 7:36 PM, Jan Holst Jensen <[email protected]> 
wrote:
> On 2014-08-22 10:38, Michał Nowotka wrote:
>
> A question I have is why you want to access the bond wedging.
>
> [...] Now imagine I only have this molfile and I want to convert it back to
> *mrv. I don't want to write my own parser for molfiles when I know
> that RDKit can already parse it. But I need to extract this 'bond
> stereo' information from within RDKit somehow.
>
> Now when you say that this '1' or 'W' value corresponds to bond
> direction, I'm guessing that 'direction' can store only two values: up
> and down so '1' and '6' ('W' and 'H' in marvin terms). So what about
> other values which this field can have, If for example I have this
> molfile:
>
>
>
>  10 10  0  0  0  0  0  0  0  0999 V2000
>    -1.6741   -0.2687    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>    -2.3885   -0.6812    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>    -2.3885   -1.5063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>    -1.6741   -1.9188    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>    -0.9596   -1.5063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>    -0.9596   -0.6812    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>    -0.2451   -0.2686    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>    -0.2451    0.5563    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>     0.4692   -0.6811    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.4692   -1.5061    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  2  0  0  0  0
>   2  3  1  0  0  0  0
>   3  4  2  0  0  0  0
>   4  5  1  0  0  0  0
>   5  6  2  0  0  0  0
>   6  1  1  0  0  0  0
>   6  7  1  0  0  0  0
>   7  9  1  0  0  0  0
>   9 10  1  0  0  0  0
>   7  8  1  4  0  0  0
> M  END
>
> So 4 instead of 1, how I will get this information from RDKit?
>
>
>
> Hi Michal,
>
> I took a look at the C++ code in GraphMol/FileParsers/MolFileParser.cpp.
>
> ParseMolFileBondLine() for parsing V2000 molfiles sets the BondDir to
> UNKNOWN (case 4, bond stereo type = 4):
>
>           stereo = FileParserUtils::toInt(text.substr(9,3));
>           switch(stereo){
>           case 0:
>             res->setBondDir(Bond::NONE);
>             break;
>           case 1:
>             res->setBondDir(Bond::BEGINWEDGE);
>             break;
>           case 6:
>             res->setBondDir(Bond::BEGINDASH);
>             break;
>           case 3: // "either" double bond
>             res->setBondDir(Bond::EITHERDOUBLE);
>         res->setStereo(Bond::STEREOANY);
>         break;
>           case 4: // "either" single bond
>             res->setBondDir(Bond::UNKNOWN);
>             break;
>           }
>
> In ParseV3000BondBlock() for V3000 molfiles the same thing happens, so they
> agree (case 2, CFG=2, bond type = single (1)):
>
>           if(prop=="CFG"){
>             unsigned int cfg=atoi(val.c_str());
>             switch(cfg){
>             case 0: break;
>             case 1:
>               bond->setBondDir(Bond::BEGINWEDGE);
>           chiralityPossible=true;
>               break;
>             case 2:
>               if(bType==1) bond->setBondDir(Bond::UNKNOWN);
>               else if(bType==2){
>         bond->setBondDir(Bond::EITHERDOUBLE);
>         bond->setStereo(Bond::STEREOANY);
>           }
>               break;
>             case 3:
>               bond->setBondDir(Bond::BEGINDASH);
>           chiralityPossible=true;
>               break;
>             default:
>               errout << "bad bond CFG "<<val<<"' on line "<<line;
>               throw FileParseException(errout.str()) ;
>             }
>           } else if(prop=="TOPO"){
>
> The bonds will therefore be assigned a BondDir value of Bond::UNKNOWN for
> single either bonds and BOND::EITHERDOUBLE for double either bonds.
>
> I read in a V2000 molfile where the second bond is a single either bond
> (stereo bond type of 4) and the third bond is a double either bond (stereo
> bond type of 3).
>
>>>> from rdkit import Chem
>>>> m = Chem.MolFromMolFile("C:/temp/either.mol", sanitize=False,
>>>> removeHs=False)
>>>> for b in m.GetBonds(): print b.GetBondDir()
> ...
> NONE
> UNKNOWN
> 5
> NONE
> NONE
> NONE
>>>>
>
>
> Only slight surprise is that Python returns a "5" instead of an
> "EITHERDOUBLE" string.
>
>>>> Chem.rdchem.BondDir.values
> {0: rdkit.Chem.rdchem.BondDir.NONE, 1: rdkit.Chem.rdchem.BondDir.BEGINWEDGE,
> 2: rdkit.Chem.rdchem.BondDir.BEGINDASH, 3:
> rdkit.Chem.rdchem.BondDir.ENDDOWNRI
> GHT, 4: rdkit.Chem.rdchem.BondDir.ENDUPRIGHT, 6:
> rdkit.Chem.rdchem.BondDir.UNKNOWN}
>>>>
>
> For some reason Python does not map the BondDir value 5 to a name. But the
> value does match EITHERDOUBLE's implicit ordinal value defined in
> GraphMol/Bond.h, so it matches what I expect from reading the parser code:
>
>     //! the bond's direction (for chirality)
>     typedef enum {
>       NONE=0,         //!< no special style
>       BEGINWEDGE,     //!< wedged: narrow at begin
>       BEGINDASH,      //!< dashed: narrow at begin
>       // FIX: this may not really be adequate
>       ENDDOWNRIGHT,   //!< for cis/trans
>       ENDUPRIGHT,     //!<  ditto
>       EITHERDOUBLE,   //!< a "crossed" double bond
>       UNKNOWN,        //!< intentionally unspecified stereochemistry
>     } BondDir;
>
> So the information is retained in GetBondDir() as long as you don't
> sanitize.
>
> Cheers
> -- Jan

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to