Hi, 

I've got a problem with the InChIKeys being generated from CML for a
series of adamantanes. The structures attached in the cml were generated
in torch (from a smiles string) and then converted from and sdf to CML
using open babel. I'm trying to use the function in the python script to
add the InChIKey of the CML to the attributes (the function takes an
lxml.etree.Element representation of the molecule CML block as input,
and adds the generated InChIKey). I want to be able to match these 3D
structures to experimental data for them that is stored in xml, which
uses the InChIKey as an id for the molecule. 

>From the csv file the expected InChIKey and the canonicalised smiles
used to generate it (in the columns exp_inchikey and exp_smiles
respectively). The InChIKey that was actually generated for the cml is
in the cml_inchikey column. The second part of the inchikey is
different, and I was wondering why this is the case? Is it to do with
some unseen stereo-chemistry that isn't in the smiles used to generate
it, or is it to do with the options I'm using for the conversion or
something else that I haven't thought of? 

Note: the expected inchikey is taken from the chemspider entry for the
molecule. 

Thanks, 

Mark Driver 

PhD student 

University of Cambridge 
exp_inchikey    exp_smiles      cml_inchikey
BHTSNYQWLQLLOD-UHFFFAOYSA-N     CC(C)(C)C(=O)C12CC3CC(CC(C3)C1)C2       
BHTSNYQWLQLLOD-WUQLGEGHSA-N
CPWSNJSGSXXVLD-UHFFFAOYSA-N     FC12CC3CC(CC(C3)C1)C2   
CPWSNJSGSXXVLD-CHIWXEEVSA-N
DACIGVIOAFXPHW-UHFFFAOYSA-N     CC(=O)C12CC3CC(CC(C3)C1)C2      
DACIGVIOAFXPHW-CDECOKDKSA-N
DKNWSYNQZKUICI-UHFFFAOYSA-N     NC12CC3CC(CC(C3)C1)C2   
DKNWSYNQZKUICI-CHIWXEEVSA-N
import openbabel as ob
from lxml import etree

def addStdInChIKeyToMolecule(molecule_cml):
    """Add stdInCHIKey attribute to a molecule.
    """
    molecule_cml_string = etree.tostring(molecule_cml)
    conversion = ob.OBConversion()
    conversion.SetInAndOutFormats("cml", "inchi")
    conversion.SetOptions("K", conversion.OUTOPTIONS)
    molecule = ob.OBMol()
    conversion.ReadString(molecule, molecule_cml_string)
    inchikey = conversion.WriteString(molecule)
    inchikey = inchikey.strip()
    molecule_cml.set("StdInChiKey", inchikey)
    return molecule_cml

Attachment: adamantaneexamples.cml
Description: XML document

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to