Hi, Brian,    The first point you mentioned was acturally what I guessed and it 
is deprecated in my context, I think.    Thanks for the second suggestion, I 
tried this and the performance improved:
suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = 
len(suppl)  # This line is crucialsuppl = list(suppl)
And the types of suppl are repectively: <class 
'rdkit.Chem.rdmolfiles.SmilesMolSupplier'>,<class 
'rdkit.Chem.rdmolfiles.SmilesMolSupplier'> ,<type 'list'>
So, though the second suppl (after len(suppl) ) is selectable, it was not a 
list indeed. It is amazing that the all molecules were instantiated after the 
`list` operator.
: )

Hongbin Yang

 From: Brian KelleyDate: 2016-11-01 19:56To: 杨弘宾CC: rdkit-discussSubject: Re: 
[Rdkit-discuss] Is there a way to init the conformations of smiles supplier to 
improve the performance for substructure matching.I'll make two more points ( 
thanks to Greg Landrum for pointing this out )
1). In your code each call to suppl[i] makes a new molecule, calling it twice 
in a row is twice as slow.  This explains your last result.
2) in my example, I was assuming that the queries were already in a python list 
and not from a supplier.  If they are being read from a supplier, you can 
easily keep them all in memory with:
queries = list(query_supplier)

Note that for large files, this can take up a lot of memory.
Thanks for the clarification Greg.
----Brian Kelley
On Nov 1, 2016, at 4:22 AM, 杨弘宾 <[email protected]> wrote:


Hi,    Supposing I'd like to matching 100 substructures with 1000 compounds 
represented as smiles.What I did is:
suppl = AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')l = 
len(suppl)for j in range(ll):  # I have to make substructures in the first 
loop.    for i in range(l):

        suppl[i].GetSubstructMatches(s[j]) and found the performance is not 
good.
Then I did a comparison and found that it was because the conformation of the 
compounds where not initiated.If I use MolFromSmiles,the performance will 
improve a lot.start = time.clock()suppl = 
AllChem.SmilesMolSupplier('allmoleculenew.smi',delimiter='\t')

l=len(suppl) print time.clock()-start   # >>> 0.0373735355168  indicating that 
the molecules were not initiated.
for i in range(l):

    suppl[i].GetSubstructMatches(sa)

    suppl[i].GetSubstructMatches(sa2)

print time.clock()-start   # >>> 11.1884715172
start = time.clock()

f = open('allmoleculenew.smi')

for i in range(l):

    mol = Chem.MolFromSmiles(f.next().split('\t')[0])

    mol.GetSubstructMatches(sa)

    mol.GetSubstructMatches(sa2)print time.clock()-start # >>> 5.44030582111
The second method was double faster than the first, indicating that the "init" 
is more time consuming compared to matching.I think SmilesMolSupplier is a good 
API to load multiple compounds but it didnot parse the smiles immediately, 
which adds the time complexity to the further application. So is it possible to 
manually initiate the compounds?


Hongbin Yang 杨弘宾

Research: Toxicophore and Chemoinformatics
Pharmaceutical Science, School of Pharmacy

East China University of Science and Technology 

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. 
http://sdm.link/xeonphi_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to