A suggestion if I may. At least part of this discussion should go in
"Section 4 - Substructure Searching" in the documentation.
These eye openers are important, otherwise you think you've just written an
incredible bit of code which however doesn't do what you intend it to.
Cheers,
Jean-Paul Ebejer
Early Stage Researcher
InhibOx Ltd
Pembroke House
36-37 Pembroke Street
Oxford
OX1 1BP
UK
(+44 / 0) 1865 262 034
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
Any unauthorised dissemination or copying of this email or its attachments,
and any use or disclosure of any information contained in them, is strictly
prohibited and may be illegal. If you have received this email in error
please notify the sender and delete all copies from your system.
We and our group companies accept no liability or responsibility for
personal emails or emails unconnected with our business.
Internet communications including emails and access and use of web sites
cannot be guaranteed to be secure or error free as information can be
intercepted, corrupted, lost or arrive late. Furthermore, while we have
taken steps to control the spread of viruses on our systems, we cannot
guarantee that this email and any files transmitted with it are virus free.
No liability is accepted for any errors, omissions, interceptions, corrupted
mail, lost communications or late delivery arising as a result of receiving
this message via the Internet or for any virus that may be contained in it.
On 9 July 2011 05:24, Greg Landrum <[email protected]> wrote:
> On Fri, Jul 8, 2011 at 8:21 PM, JP <[email protected]> wrote:
> > Any ideas why the attached (not very well written) code gives this error
> (or
> > warning?) when run in the following manner:
> > Set_Protonation_State.py Set_Protonation_State_Tests.smi
> > [19:17:10] unsupported radical count: 3 set to 3.
> > And what does it mean? Doesn't make much sense to me. The unsupported
> > value is 3 and so I/rdkit has to set it to 3 (?!)
> > Am I being thick?
>
> You mean you don't find the meaning of the warning completely obvious? ;-)
>
> The warning is coming during the SDF writing process when the RAD line
> is being written for atoms. The "RAD" line in CTABs, despite it's
> name, isn't actually the radical count on atoms; it's their spin
> state. So "2" means "doublet", not "two unpaired electrons". The
> warning message is telling you that it found an atom with a radical
> count of 3 (!) and set the value for the spin state in the SD file to
> 3. It's a stupid error message combined with chemically unreasonable
> behavior. I fixed the CTAB writer to generate "2" (doublet) for odd
> radical counts and "3" (triplet) for even radical counts. The RDKit
> doesn't have enough information to be able to distinguish doublet from
> singlet, so this last bit is totally arbitrary.
>
> There's a more interesting question about how exactly the radical
> count is getting that high in your example. Your code is not doing
> exactly what I guess you think it's doing. Starting with the first
> replacement:
>
> repl1 = Chem.MolFromSmiles('[O-]')
> patt1 = Chem.MolFromSmarts('[$([OH]C(=O))]')
> m1 = AllChem.ReplaceSubstructs(reagent1,patt1,repl1,replaceAll=True)
> print "Input: " , Chem.MolToSmiles(reagent1), " Output: ",
> Chem.MolToSmiles(m1[0])
>
> Your replacement atom has a radical on it:
> >>> repl = Chem.MolFromSmiles('[O-]')
> >>> repl.Debug()
> Atoms:
> 0 8 O chg: -1 deg: 0 exp: 0 imp: 0 hyb: 4 arom?: 0 chi: 0 rad: 1
> Bonds:
>
> And that radical is carried over into the product molecule:
>
> >>> patt1 = Chem.MolFromSmarts('[$([OH]C(=O))]')
> >>> reagent1 = Chem.MolFromSmiles('CC(=O)O')
> >>> m1 = AllChem.ReplaceSubstructs(reagent1,patt1,repl,replaceAll=True)
> >>> m1[0].Debug()
> Atoms:
> 0 6 C chg: 0 deg: 1 exp: 1 imp: 3 hyb: 4 arom?: 0 chi: 0
> 1 6 C chg: 0 deg: 3 exp: 4 imp: 0 hyb: 3 arom?: 0 chi: 0
> 2 8 O chg: 0 deg: 1 exp: 2 imp: 0 hyb: 3 arom?: 0 chi: 0
> 3 8 O chg: -1 deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0 rad: 1
> Bonds:
> 0 0->1 order: 1 conj?: 0 aromatic?: 0
> 1 1->2 order: 2 conj?: 1 aromatic?: 0
> 2 3->1 order: 1 conj?: 0 aromatic?: 0
>
> I guess this is not what you intend.
>
> A simple workaround is to not sanitize your replacement pieces:
> >>> repl2 = Chem.MolFromSmiles('[O-]',sanitize=False)
> >>> m1 = AllChem.ReplaceSubstructs(reagent1,patt1,repl2,replaceAll=True)
> >>> m1[0].Debug()
> Atoms:
> 0 6 C chg: 0 deg: 1 exp: 1 imp: 3 hyb: 4 arom?: 0 chi: 0
> 1 6 C chg: 0 deg: 3 exp: 4 imp: 0 hyb: 3 arom?: 0 chi: 0
> 2 8 O chg: 0 deg: 1 exp: 2 imp: 0 hyb: 3 arom?: 0 chi: 0
> 3 8 O chg: -1 deg: 1 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0
> Bonds:
> 0 0->1 order: 1 conj?: 0 aromatic?: 0
> 1 1->2 order: 2 conj?: 1 aromatic?: 0
> 2 3->1 order: 1 conj?: 0 aromatic?: 0
> >>> print Chem.MolToSmiles(m1[0])
> CC(=O)[O-]
>
> You'll have to be somewhat careful about your tautomer fixes (repl 4
> and repl 5), but those should also be manageable this way.
>
> The other problem that's likely to bite you relatively soon is the
> fact that you aren't updating chemistry information on molecules
> between steps. You should at least be updating the valence counts and
> ring information.
>
> I've attached a somewhat refactored version of your code that is more
> likely to generate correct results.
>
> -greg
>
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss