Hi George,
> This is probably not going to solve the problem at hand but it may be useful
> to you or others in the future:
> ChEMBLdb maintains a molecular hierarchy table where you can retrieve the
> parent (=desalted - using Pipeline Pilot) structures for each molecular
> entity.
> You may try something like this:
>
> select distinct cs.molregno, cs.molfile, cs.canonical_smiles
> from compound_structures cs, molecule_hierarchy mh
> where cs.molregno = mh.parent_molregno
I confess pure ignorance here. While I've worked with databases, it's far from
the list of things I know well. Reading the ERD is not simple for me, I don't
have MySQL or Oracle installed on my machines, and I don't know how to browse
through the schema and tables like I've seen those who are more database
proficient than I do. So while I have an idea of what you are talking about,
it's not something I can easily put into place.
But as you say, it's not the problem, because RDKit's failure exception comes
even using the original, unprocessed/un-de-salted record.
Since you're here -- how come ChEMBL doesn't put an identifier on the first
line of the SD record? Nearly all of them are blank; the exceptions are a dozen
with mostly useless titles like:
Acetic acid 6-(1-phenyl-ethyl)-6-aza-bicyclo[3.2.1]oct-3-yl ester
4-(4-Fluoro-phenyl)-2-methylsulfanyl-thiophene-3-carbonitrile
6-amino-9-(5-{[(1,2,3,3-tetrahydroxy-1,2,3-trioxidotriphosphanyl)oxy]methyl}tetr
2-Methyl-2,3-dihydro-benzofuran-7-carboxylic acid 8-methyl-8-aza-bicyclo[3.2.1]o
(S)-N-((S)-1,6-diamino-1-oxohexan-2-yl)-1-((S)-5-guanidino-2-((2S,3S)-2-((S)-5-g
Acetic acid 6-(1-phenyl-ethyl)-6-aza-bicyclo[3.2.1]oct-3-yl ester
I end up doing a mol.SetProp("_Name", mol.GetProp("chembl_id")) so that my
output SMILES have an identifier tied to them, and that seems like a needless
extra step.
Andrew
[email protected]
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss