Hi all,
I'm doing some work where I want to turn a subgraph of a molecule
into a canonical SMARTS. I can almost use the canonical SMILES, but
it uses "cc" for aromatic bonds connecting aromatic atoms, rather
than "c:c". Thus, as a SMARTS it matches both "c:c" and "c-c".
RDKit does generate an explicit single bond, as 'c-c', for single
bonds which connect two aromatic atoms. This is the right thing
to do, and it means that I can work around this problem syntactically
by post-processing the SMILES to insert the ':'s where needed. That's
a non-trivial effort though.
I wondered if there was an easier way to do it. For example, how
hard is it to change the MolToSmiles code to handle another parameter,
perhaps 'explicitAromaticBonds', and have it put in the ':'s? Or is
there some way to get RDKit to do what I want directly? Perhaps some
secret option to MolToSmarts?
For reference, here's the code I use to generate the 'canonical
SMARTS.'
import collections
from rdkit import Chem
# A Molecule contains a list of 'Atom's and list of 'Bond's.
# Atom and Bond indicies are offsets into those respective lists.
# An Atom has a list of "bond_indicies".
# A Bond has a 2-element list of "atom_indicies"
Molecule = collections.namedtuple("Molecule", "atoms bonds")
Atom = collections.namedtuple("Atom", "real_atom bond_indicies")
Bond = collections.namedtuple("Bond", "real_bond atom_indicies")
# A Subgraph is a list of atom and bond indicies in a Molecule
Subgraph = collections.namedtuple("Subgraph", "atom_indicies bond_indicies")
# In my code I do a lot of atom lookups so I decided to put the
# molecular graph into Python, to take advantage of its lookup
# optimizations.
def adapt_rdkit_mol(rdmol):
atoms = [Atom(atom, [bond.GetIdx() for bond in atom.GetBonds()]) for atom
in rdmol.GetAtoms()]
bonds = [Bond(bond, [bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()]) for
bond in rdmol.GetBonds()]
return Molecule(atoms, bonds)
# Convert the subgraph (of the reference molecule) into a canonical SMARTS
def make_canonical_smarts(subgraph, mol):
# Make a new molecule which is a copy of the subgraph atoms/bonds of the old
emol = Chem.EditableMol(Chem.Mol())
atoms = {}
n = 0
for atom_index in subgraph.atom_indicies:
real_atom = mol.atoms[atom_index].real_atom
emol.AddAtom(real_atom)
atoms[atom_index] = n
n += 1
for bond_index in subgraph.bond_indicies:
bond = mol.bonds[bond_index]
emol.AddBond(atoms[bond.atom_indicies[0]], atoms[bond.atom_indicies[1]],
bond.real_bond.GetBondType())
mol = emol.GetMol()
return Chem.MolToSmiles(mol)
if __name__ == "__main__":
rdmol = Chem.MolFromSmiles("CC(O)N=O")
mol = adapt_rdkit_mol(rdmol)
subgraph = Subgraph([1,2,3,4], [1, 2, 3])
print make_canonical_smarts(subgraph, mol)
Andrew
[email protected]
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss