Hi Greg, Thanks for the examples. I will give it a try.
JW ___________________ JW Feng, Ph.D. Denali Therapeutics Inc. 201 Gateway Blvd. South San Francisco, CA 94080 | (650) 270-0628 On Thu, Oct 22, 2015 at 2:06 AM, Greg Landrum <[email protected]> wrote: > Hi JW, > > On Thu, Oct 22, 2015 at 12:47 AM, JW Feng <[email protected]> wrote: > >> >> I read a post (link below) about SD tag reordering by Matthew and replied >> by Greg and I have a follow up question. I would like to preserve the >> ordering of SD tags as they appear in the input SD file. I tried getting >> the list of SD tags by mol.GetPropNames() and setting the order with >> sd_writer.SetProps() but that didn't work. Turns out mol.GetPropNames() >> returns a list in alphabetical order instead of order of appearance. >> > > I would say instead that they appear in an unspecified, implementation > dependant, order. This may be alphabetic, but it's certainly not guaranteed > to be so. > > >> Is there a way to preserve SD tag orders? >> > > There is currently no way to do this automatically. I have always thought > about those properties as being unordered, so the RDKit doesn't maintain > any record of what order properties are added to a molecule. > > As long as you have the original SDMolSupplier, you can pretty easily get > the ordered list of property names from that: > > In [22]: suppl = Chem.SDMolSupplier('tmp.sdf') > > In [23]: m = suppl[0] > > In [25]: list(m.GetPropNames()) # <- here's the non-ordered list > Out[25]: > ['PUBCHEM_ATOM_DEF_STEREO_COUNT', > 'PUBCHEM_ATOM_UDEF_STEREO_COUNT', > 'PUBCHEM_BONDANNOTATIONS', > 'PUBCHEM_BOND_DEF_STEREO_COUNT', > 'PUBCHEM_BOND_UDEF_STEREO_COUNT', > 'PUBCHEM_CACTVS_COMPLEXITY', > 'PUBCHEM_CACTVS_HBOND_ACCEPTOR', > 'PUBCHEM_CACTVS_HBOND_DONOR', > 'PUBCHEM_CACTVS_ROTATABLE_BOND', > 'PUBCHEM_CACTVS_SUBSKEYS', > 'PUBCHEM_CACTVS_TAUTO_COUNT', > 'PUBCHEM_CACTVS_TPSA', > 'PUBCHEM_COMPONENT_COUNT', > 'PUBCHEM_COMPOUND_CANONICALIZED', > 'PUBCHEM_COMPOUND_CID', > 'PUBCHEM_COORDINATE_TYPE', > 'PUBCHEM_EXACT_MASS', > 'PUBCHEM_HEAVY_ATOM_COUNT', > 'PUBCHEM_ISOTOPIC_ATOM_COUNT', > 'PUBCHEM_IUPAC_CAS_NAME', > 'PUBCHEM_IUPAC_INCHI', > 'PUBCHEM_IUPAC_INCHIKEY', > 'PUBCHEM_IUPAC_NAME', > 'PUBCHEM_IUPAC_OPENEYE_NAME', > 'PUBCHEM_IUPAC_SYSTEMATIC_NAME', > 'PUBCHEM_IUPAC_TRADITIONAL_NAME', > 'PUBCHEM_MOLECULAR_FORMULA', > 'PUBCHEM_MOLECULAR_WEIGHT', > 'PUBCHEM_MONOISOTOPIC_WEIGHT', > 'PUBCHEM_OPENEYE_CAN_SMILES', > 'PUBCHEM_OPENEYE_ISO_SMILES', > 'PUBCHEM_TOTAL_CHARGE', > 'PUBCHEM_XLOGP3_AA'] > > In [26]: txt = suppl.GetItemText(0) > > In [27]: pns = re.findall(r'> *<(\w+)>',txt) # <- this gives you the > list in order > > In [28]: pns > Out[28]: > ['PUBCHEM_COMPOUND_CID', > 'PUBCHEM_COMPOUND_CANONICALIZED', > 'PUBCHEM_CACTVS_COMPLEXITY', > 'PUBCHEM_CACTVS_HBOND_ACCEPTOR', > 'PUBCHEM_CACTVS_HBOND_DONOR', > 'PUBCHEM_CACTVS_ROTATABLE_BOND', > 'PUBCHEM_CACTVS_SUBSKEYS', > 'PUBCHEM_IUPAC_OPENEYE_NAME', > 'PUBCHEM_IUPAC_CAS_NAME', > 'PUBCHEM_IUPAC_NAME', > 'PUBCHEM_IUPAC_SYSTEMATIC_NAME', > 'PUBCHEM_IUPAC_TRADITIONAL_NAME', > 'PUBCHEM_IUPAC_INCHI', > 'PUBCHEM_IUPAC_INCHIKEY', > 'PUBCHEM_XLOGP3_AA', > 'PUBCHEM_EXACT_MASS', > 'PUBCHEM_MOLECULAR_FORMULA', > 'PUBCHEM_MOLECULAR_WEIGHT', > 'PUBCHEM_OPENEYE_CAN_SMILES', > 'PUBCHEM_OPENEYE_ISO_SMILES', > 'PUBCHEM_CACTVS_TPSA', > 'PUBCHEM_MONOISOTOPIC_WEIGHT', > 'PUBCHEM_TOTAL_CHARGE', > 'PUBCHEM_HEAVY_ATOM_COUNT', > 'PUBCHEM_ATOM_DEF_STEREO_COUNT', > 'PUBCHEM_ATOM_UDEF_STEREO_COUNT', > 'PUBCHEM_BOND_DEF_STEREO_COUNT', > 'PUBCHEM_BOND_UDEF_STEREO_COUNT', > 'PUBCHEM_ISOTOPIC_ATOM_COUNT', > 'PUBCHEM_COMPONENT_COUNT', > 'PUBCHEM_CACTVS_TAUTO_COUNT', > 'PUBCHEM_COORDINATE_TYPE', > 'PUBCHEM_BONDANNOTATIONS'] > > If you pass that list of property names to the SDWriter's SetPropNames() > method, it will write things out in the input order. > > I hope this helps, > -greg > >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

