Dear DK,

Cedric's answer is a good start that looks like it would work. I'd
refine it a little bit and use a somewhat different mechanism for
calculating the descriptors. Here's a piece of python that, given a
descriptor calculator file (see below) will do what I think you want:

#-----------------------------------
from rdkit import Chem
from rdkit.RDLogger import logger
logger=logger()
import cPickle,sys

calc = cPickle.load(file('moe_like.dsc','rb'))
nms = list(calc.GetDescriptorNames())
suppl = Chem.SmilesMolSupplier(sys.argv[1],titleLine=False)
w = Chem.SmilesWriter(sys.argv[2])
w.SetProps(nms)
nDone=0
for mol in suppl:
    nDone += 1
    if not nDone%1000: logger.info("Done %d"%nDone)
    if mol is None: continue
    descrs = calc.CalcDescriptors(mol)
    for nm,v in zip(nms,descrs):
        mol.SetProp(nm,str(v))
    w.write(mol)
#-----------------------------------

The script uses the first argument as the input file name and the
second argument as the output file name. It uses a descriptor
calculator that it loads from a file named "moe_like.dsc" (there's a
file with this name that will work in the directory
$RDBASE/Projects/DbCLI/).

To make this file easier to run, I'd suggest wrapping it in a shell
script (linux/mac) or bat file (windows) that sets the RDBASE
environment variable, the PATH, and the LD_LIBRARY_PATH (linux) or
DYLD_LIBRARY_PATH (mac). On windows you can call pythonw.exe insted of
python.exe to avoid opening a new window.

So what's a descriptor calculator? This is a mechanism provided by the
RDKit that allows you to package a set of descriptors together for
easy reuse. It's useful if you don't want to generate everything (as
Cedric's script does) or want to be sure you always generate the same
descriptors in the same order (the version from Cedric will generate
new descriptors as they become available; these new descriptors could
change the ordering of the old ones).

Here's an example of how to create a new descriptor calculator (from
Python) and then save it to a .dsc file you could use in the sample
script above. In case you aren't familiar with python at all, this is
showing what I typed at the python prompt and how python responded:
In [1]: from rdkit.ML.Descriptors.MoleculeDescriptors import
MolecularDescriptorCalculator
In [2]: calc = 
MolecularDescriptorCalculator(['MolLogP','NOCount','NHOHCount','MolWt','NumRotatableBonds','TPSA'])
In [3]: import cPickle
In [4]: cPickle.dump(calc,file('simple_2d.dsc','w+'))

And, to show how the calculator is used inside Python:

In [5]: from rdkit import Chem
In [6]: m = Chem.MolFromSmiles('c1ncccc1CC(=O)O')
In [7]: calc.CalcDescriptors(m)
Out[7]: (0.70869999999999989, 3, 1, 137.13800000000001, 2, 50.189999999999998)

Best Regards,
-greg

On Wed, Jun 2, 2010 at 11:29 AM, Cedric MORETTI
<[email protected]> wrote:
> Not tested
>
>
>
>
>
> # script RD_descript.py
>
> print "Hello from RD_descript "
>
>
>
> from cinfony import rdk
>
> from rdkit import Chem
>
> from rdkit.Chem import AvailDescriptors
>
>
>
> for d in AvailDescriptors.descDict:
>
>    print d
>
>
>
>
>
> suppl = open("Nom file","r")
>
> w = Chem.SDWriter(“SDF File”)
>
> numRead = 0
>
> numStructures = 0
>
> for m in suppl:
>
>    numRead += 1
>
>    if m != None:
>
>       numStructures += 1
>
>       smi = Chem.MolToSmiles(m.strip())
>
>       m.SetProp("SMILES",smi)
>
>       print smi
>
>       for d in AvailDescriptors.descDict:
>
> #      print d
>
>          pr = AvailDescriptors.descDict[d]( m.strip())
>
> #      print str(pr)
>
>          m.SetProp(d,str(pr))
>
>       w.write(m)
>
>
>
> print "nombre initiale = " + str(numRead )
>
> print "nombre finale = " + str(numStructures)
>
>
>
> From: Damjan Krstajic [mailto:[email protected]]
> Sent: mercredi, 2. juin 2010 11:21
> To: [email protected]
> Subject: [Rdkit-discuss] RDKit descriptors batch
>
>
>
> Hello,
>
> I would like to use RDKit to calculate descriptors. I am interested in a
> batch program which would calculate the RDKit descriptors from a smiles file
> (.smi). I don't have any experience with Python. Do you have any advice on
> how to create the batch program? I am prepared to code it and give it to you
> so that others can use it.
>
> Thanks
> DK
>
> ________________________________
>
> Get a new e-mail account with Hotmail - Free. Sign-up now.
>
> **********************************************************************
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> **********************************************************************
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>

------------------------------------------------------------------------------

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to