On 2014-08-22 10:38, Michał Nowotka wrote:
A question I have is why you want to access the bond wedging.
[...] Now imagine I only have this molfile and I want to convert it back to
*mrv. I don't want to write my own parser for molfiles when I know
that RDKit can already parse it. But I need to extract this 'bond
stereo' information from within RDKit somehow.
Now when you say that this '1' or 'W' value corresponds to bond
direction, I'm guessing that 'direction' can store only two values: up
and down so '1' and '6' ('W' and 'H' in marvin terms). So what about
other values which this field can have, If for example I have this
molfile:
10 10 0 0 0 0 0 0 0 0999 V2000
-1.6741 -0.2687 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.3885 -0.6812 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.3885 -1.5063 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.6741 -1.9188 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.9596 -1.5063 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.9596 -0.6812 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.2451 -0.2686 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.2451 0.5563 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.4692 -0.6811 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.4692 -1.5061 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
3 4 2 0 0 0 0
4 5 1 0 0 0 0
5 6 2 0 0 0 0
6 1 1 0 0 0 0
6 7 1 0 0 0 0
7 9 1 0 0 0 0
9 10 1 0 0 0 0
7 8 1 4 0 0 0
M END
So 4 instead of 1, how I will get this information from RDKit?
Hi Michal,
I took a look at the C++ code in GraphMol/FileParsers/MolFileParser.cpp.
ParseMolFileBondLine() for parsing V2000 molfiles sets the BondDir to
UNKNOWN (case 4, bond stereo type = 4):
stereo = FileParserUtils::toInt(text.substr(9,3));
switch(stereo){
case 0:
res->setBondDir(Bond::NONE);
break;
case 1:
res->setBondDir(Bond::BEGINWEDGE);
break;
case 6:
res->setBondDir(Bond::BEGINDASH);
break;
case 3: // "either" double bond
res->setBondDir(Bond::EITHERDOUBLE);
res->setStereo(Bond::STEREOANY);
break;
case 4: // "either" single bond
res->setBondDir(Bond::UNKNOWN);
break;
}
In ParseV3000BondBlock() for V3000 molfiles the same thing happens, so
they agree (case 2, CFG=2, bond type = single (1)):
if(prop=="CFG"){
unsigned int cfg=atoi(val.c_str());
switch(cfg){
case 0: break;
case 1:
bond->setBondDir(Bond::BEGINWEDGE);
chiralityPossible=true;
break;
case 2:
if(bType==1) bond->setBondDir(Bond::UNKNOWN);
else if(bType==2){
bond->setBondDir(Bond::EITHERDOUBLE);
bond->setStereo(Bond::STEREOANY);
}
break;
case 3:
bond->setBondDir(Bond::BEGINDASH);
chiralityPossible=true;
break;
default:
errout << "bad bond CFG "<<val<<"' on line "<<line;
throw FileParseException(errout.str()) ;
}
} else if(prop=="TOPO"){
The bonds will therefore be assigned a BondDir value of Bond::UNKNOWN
for single either bonds and BOND::EITHERDOUBLE for double either bonds.
I read in a V2000 molfile where the second bond is a single either bond
(stereo bond type of 4) and the third bond is a double either bond
(stereo bond type of 3).
>>> from rdkit import Chem
>>> m = Chem.MolFromMolFile("C:/temp/either.mol", sanitize=False,
removeHs=False)
>>> for b in m.GetBonds(): print b.GetBondDir()
...
NONE
UNKNOWN
5
NONE
NONE
NONE
>>>
Only slight surprise is that Python returns a "5" instead of an
"EITHERDOUBLE" string.
>>> Chem.rdchem.BondDir.values
{0: rdkit.Chem.rdchem.BondDir.NONE, 1:
rdkit.Chem.rdchem.BondDir.BEGINWEDGE, 2:
rdkit.Chem.rdchem.BondDir.BEGINDASH, 3: rdkit.Chem.rdchem.BondDir.ENDDOWNRI
GHT, 4: rdkit.Chem.rdchem.BondDir.ENDUPRIGHT, 6:
rdkit.Chem.rdchem.BondDir.UNKNOWN}
>>>
For some reason Python does not map the BondDir value 5 to a name. But
the value does match EITHERDOUBLE's implicit ordinal value defined in
GraphMol/Bond.h, so it matches what I expect from reading the parser code:
//! the bond's direction (for chirality)
typedef enum {
NONE=0, //!< no special style
BEGINWEDGE, //!< wedged: narrow at begin
BEGINDASH, //!< dashed: narrow at begin
// FIX: this may not really be adequate
ENDDOWNRIGHT, //!< for cis/trans
ENDUPRIGHT, //!< ditto
EITHERDOUBLE, //!< a "crossed" double bond
UNKNOWN, //!< intentionally unspecified stereochemistry
} BondDir;
So the information is retained in GetBondDir() as long as you don't
sanitize.
Cheers
-- Jan
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss