Hi all,
I started writing a RDKIT-python-script including Andrew Dalke's MCS
implementation (thanks by the way) for a ligand-based (MCS) alignment and
noticed some
odd things - which might also be due to my implementation. The idea of the
program is that given a template molecule and a library (both as SDF), the
MCS
is searched between the template and a particular library molecule. A new
conformation for the library molecule is generated using as a "coordinate
map"
the atoms identified in the MCS with the template (using
"EmbedMultipleConf(....)"). The conformations are optimized, redundancy is
removed and the most similar
conformations are written to a new SDF (full code at end mail , start with
"python MCSalign.py template.sdf library.sdf out.sdf ").
The points I noticed:
- after running "EmbedMultipleConf(....)" the atoms of the MCS are often
not aligned (reference vs. library compound) although I specified a
coordinate map (I also tried increasing the
force tolerance which did not help), and I have to do an additional
alignment for the correct superposition - is this to be expected?
- Sometimes when I use the same molecule as reference and library compound
(for testing purposes), the conformations do not match even after the
alignment ?
(e.g. ZINC-molecule http://zinc.docking.org/substance/33006049)
- in order to get a 1-1 correspondence of atom ids (to get the coordinate
map) I had to search the MCS-SMARTS match again against the original files
to
get the atom-ids - is there a more direct way to do this?
Cheers
Fabian
Full Code:
import sys;
if (len(sys.argv) < 4):
print "MCSalign.py [template SDfile] [library SDF] [output SDF]
[rms-threshold] [shape-threshold] [#-conf.s]";
exit(1);
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import MCS
from rdkit.Chem import rdMolAlign
#template for alignment
master = Chem.SDMolSupplier(sys.argv[1]);
#database of 2D (or 3D) ligands to be aligned
suppl = Chem.SDMolSupplier(sys.argv[2]);
#output-file
writer = Chem.SDWriter(sys.argv[3])
#prewriter = Chem.SDWriter("preopt.sdf") # to be removed
refmol = master.next(); # currently only the first cpds is used as
reference -> to be exanded
if(refmol is None):
print 'Error for reference-molecule %s' %(sys.argv[1])
exit(13)
rmsthreshold = -1
shapethreshold = 2
if(len(sys.argv)>4):
rmsthreshold = float(sys.argv[4])
if(len(sys.argv)>5):
shapethreshold = float(sys.argv[5])
nconf=100
if(len(sys.argv)>6):
nconf = int(sys.argv[6])
#
forcetol=0.1
# rmsd for intra-comparisons and conf.gen. pruning cutoff
cutoff=0.5
for i,mol in enumerate(suppl):
if mol is None:
print 'Error for molecule at position %d' %(i) # skip bad
molecules
else:
s='molecule %d:' %i
if(mol.HasProp('_Name')):
s = '%s' %(mol.GetProp('_Name'))
refconf = refmol.GetConformer()
molconf = mol.GetConformer()
mols = [refmol, mol]
nsuccess = 0
#
# : matchValences=True,completeRingsOnly=True,bondCompare=
"bondtypes"
mcsres = MCS.FindMCS(mols, minNumAtoms=3, timeout=5)
if(mcsres.numAtoms<3 or not mcsres.completed):
print "Could not generate MCS for %s" %(s)
else:
patt = Chem.MolFromSmarts(mcsres.smarts)
refat = refmol.GetSubstructMatch(patt)
dbat = mol.GetSubstructMatch(patt)
aliatoms = []
coordmap = {}
#generate atom-idx mapping using coordinates from reference
molecule
for k in range(0, len(refat) ):
pt3 = refconf.GetAtomPosition(refat[k])
molconf.SetAtomPosition(dbat[k],pt3)
coordmap[dbat[k]] = pt3
aliatoms.append( [dbat[k],refat[k]] )
mol.AddConformer(molconf,assignId=0)
vec =
AllChem.EmbedMultipleConfs(mol,coordMap=coordmap,forceTol=forcetol,numConfs=nconf,pruneRmsThresh=cutoff)
if(len(vec) == 0):
print "No conformations generated for molecule %s" %(s)
else:
bestsim = -1
bestrms = 1e10
conf = -1
check = []
# loop over all generated conformations
for c in range(0,len(vec)):
check.append(1)
#optimize current conformation
AllChem.UFFOptimizeMolecule(mol,confId=c) #,confId=0 ,
,maxIters=100
# compare current conformation with previously
optimized conformations
# and ignore if it is to similar to any other
conformation
for d in range(0,c):
if(check[c]!=0 and c != d):
rms = AllChem.AlignMol(mol, mol, c, d)
if(rms<=cutoff):
check[c] = 0
break
if(check[c]==1):
#re-align current conformation
rms =
Chem.rdMolAlign.AlignMol(mol,refmol,atomMap=aliatoms,prbCid=c)
sim = AllChem.ShapeTanimotoDist(refmol,
mol,confId2=c)
#write out all conformations which are below at
least one threshold
if(rms<rmsthreshold or sim > shapethreshold):
print "Writing conformation with RMS %f SIM %f"
%(rms,sim)
writer.write(mol,confId=c)
nsuccess = nsuccess + 1
# if sim, rmsd are not below threshold -> store
best solution so far
if(sim>bestsim and rms<bestrms):
bestsim=sim
bestrms=rms
conf=c
if(nsuccess==0 and conf>=0 and check[conf]!=0 ):
#only write best conformation if none has been saved so
far
if(rmsthreshold<0 and shapethreshold >1):
print "Writing best conformation shape-sim %.3f rms
%.3f" %(bestsim,bestrms)
writer.write(mol,confId=conf)
nsuccess = 1
elif (nsuccess==0):
print "Error when generating conformation"
print "Wrote %d conformations for %s" %(nsuccess,s)
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss