On Mon, May 9, 2011 at 10:27 AM, JP <[email protected]> wrote:
> Using RDKit 2010.10
> Some error messages need to be more helpful. For e.g. in a 10,000 molecule
> smiles file:
> [23:07:01] Can't kekulize mol
> Traceback (most recent call last):
> File "./x.py", line 314, in <module>
> main()
> File "./x.py", line 303, in main
> mols = doSomething(...)
> File "./x.py", line 185, in doSomething
> mol_noH = Chem.RemoveHs(mol_h)
> ValueError: Sanitization error: Can't kekulize mol
> Or a "Cannot parse smiles" error give an indication of what is going on --
> but I need to know which mol they are failing at...
> Something like
> [23:07:01] Can't kekulize mol (some id and/or textual smiles)
This is quite difficult to do since the kekulization function, called
by RemoveHs, just sees the molecule. This is one you could do yourself
though by catching that ValueError and then displaying the input
value. If you are using either a SDMolSupplier or a SmilesMolSupplier,
you may find its GetItemText() method quite useful here. If you aren't
using one of those, you can try doing Chem.MolToSmiles on the bad
molecule with the canonical argument set to False:
In [15]: m = Chem.MolFromSmiles('[H]c1c([H])c([H])c([H])c1[H]',sanitize=False)
In [16]: mh = Chem.RemoveHs(m)
[06:12:37] Can't kekulize mol
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/glandrum/RDKit_trunk/build/<ipython console> in <module>()
ValueError: Sanitization error: Can't kekulize mol
In [17]: Chem.MolToSmiles(m,canonical=False)
Out[17]: '[H]c1c([H])c([H])c([H])c1[H]'
The "canonical" argument was added to MolToSmiles() in the 2010.12
release, so you'll need to be using something at least as up-to-date
as that.
> [23:07:01] Cannot parse smiles string ("Ccc1XXXcCCCC")
> Would be more helpful...
This much is certainly no problem to do so that one gets output like this:
In [2]: Chem.MolFromSmiles('Ccc1XXXcCCC')
[06:06:25] syntax error while parsing: Ccc1XXXcCCC
[06:06:25] SMILES Parse Error
In [3]: Chem.MolFromSmiles('C1C')
[06:06:28] Smiles parser error: unclosed ring for input C1C
If this looks useful to people I can go ahead and make the change for
the next release.
Best,
-greg
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss