Hello all,
Thank you in advance for taking your time and I hope to explain the
problem well. I am trying to find the largest induced subfragment given two
molecules. For example given ATP (Adenosine Triposphate) and Guanine, I
would like the algorithm to return the Purine ring.
I wrote a piece of code using NetworkX library in Python to implement the
algorithm described here (
https://en.wikipedia.org/wiki/Modular_product_of_graphs) and find the
maximum clique. I included the function below.
The problem here is that it gets slow as the size of the molecule
increases due to the fact that the problem is NP-complete. However at the
limit of one molecule being the substructure of the other one, the problem
is the same as SMARTS pattern matching and the findall() function of pybel
is much much faster than my version. I am wondering what's the algorithm
behind SMARTS match and is there an easier way to do what I did using
existing Openbabel libraries.
Looking forward to your thoughts.
Best,
Sam
def map_mols(mol1, mol2, include_hetero=True):
mol1.removeh()
mol2.removeh()
G1 = networkx.Graph()
G2 = networkx.Graph()
if include_hetero:
for index, atom in enumerate(mol1):
G1.add_node(index, type=atom.atomicnum)
else:
for index, atom in enumerate(mol1):
G1.add_node(index, type=1)
for bond in ob.OBMolBondIter(mol1.OBMol):
G1.add_edge(bond.GetBeginAtomIdx()-1, bond.GetEndAtomIdx()-1)
if include_hetero:
for index, atom in enumerate(mol2):
G2.add_node(index, type=atom.atomicnum)
else:
for index, atom in enumerate(mol1):
G1.add_node(index, type=1)
for bond in ob.OBMolBondIter(mol2.OBMol):
G2.add_edge(bond.GetBeginAtomIdx()-1, bond.GetEndAtomIdx()-1)
# generating module product graph of G1xG2
# https://en.wikipedia.org/wiki/Modular_product_of_graphs
P = operators.product.cartesian_product(G1, G2)
for j,i in enumerate((P.edges(data=True))):
P.remove_edge(i[0],i[1])
num = 0
for i in P.nodes(data=False):
for j in P.nodes(data=False):
if i[0] != j[0] and i[1] != j[1]:
if (G1.has_edge(i[0], j[0]) and G2.has_edge(i[1], j[1])) or (G1.has_edge(i[0], j[0])==False and G2.has_edge(i[1], j[1])==False):
num += 1
P.add_edge(i, j)
print "Number of edge in module product graph=", len(P.edges(data=True))
# finding maximum cliques of the module product graph
return list(clique.find_cliques(P))
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss