Re: [Open Babel] Convert SMILES to SDF with a property from the name line?

Rudy Richardson Tue, 01 Mar 2022 15:30:12 -0800

Hi Andrew,

Many thanks for your detailed answer!


Yes, I use Python. I will give your suggested solution a try.

Best regards,

Rudy


On Sun, Feb 27, 2022 at 4:27 PM Andrew Dalke <da...@dalkescientific.com>
wrote:

> Hi Rudy,
>
>
>
> > On Feb 27, 2022, at 20:55, Rudy Richardson <rjr...@umich.edu> wrote:
> >
> > I have a library of ~1000 compounds as SMILES strings with an appended
> name code and a property. For example:
> >
> > c1ccc(c2ccccc2)cc1    0001    -2.52
> >
> > Where "0001" is the name code and "-2.52" is a physicochemical property
> of the molecule.
> >
> > I would like to convert these strings to a concatenated SDF file,
>
>
> If you're comfortable working with Python, here's an example using the
> pybel interface.
>
> First, here's how to get the name code and property
>
> >>> from openbabel import pybel
> >>> mol = pybel.readstring("smi", "c1ccc(c2ccccc2)cc1\t0001\t-2.52")
> >>> mol
> <openbabel.pybel.Molecule object at 0x1101dece0>
> >>> mol.title
> '0001\t-2.52'
>
> In that case I used tabs (represented as "\t"), because I believe that's
> what's in your file. That would explain the extra space between the fields.
>
> I'll use Python's string.split() to split on any whitespace (which
> includes both spaces and tabs)
>
> >>> mol.title.split()
> ['0001', '-2.52']
>
> and assign them to the variables "name_code" and "value".
>
> >>> name_code, value = mol.title.split()
>
> The pybel API has a "write()" method on molecules which formats it into a
> given format. Here's what it looks like in "sdf".
>
> >>> print(mol.write("sdf"))
> 0001    -2.52
>  OpenBabel02272221522D
>
>  12 13  0  0  0  0  0  0  0  0999 V2000
>  ...
>
> I need to change the title and add a "logX" data item, which I can do with:
>
> >>> mol.title = name_code
> >>> mol.data["logX"] = value
>
> giving most of what you wanted.
>
> >>> print(mol.write("sdf"))
> 0001
>  OpenBabel02272221552D
>
>  12 13  0  0  0  0  0  0  0  0999 V2000
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>   1 12  2  0  0  0  0
>   1  2  1  0  0  0  0
>   2  3  2  0  0  0  0
>   3  4  1  0  0  0  0
>   4  5  1  0  0  0  0
>   4 11  2  0  0  0  0
>   5 10  2  0  0  0  0
>   5  6  1  0  0  0  0
>   6  7  2  0  0  0  0
>   7  8  1  0  0  0  0
>   8  9  2  0  0  0  0
>   9 10  1  0  0  0  0
>  11 12  1  0  0  0  0
> M  END
> >  <logX>
> -2.52
>
> $$$$
>
> You also had a "No." field. I don't know if that is the index of the input
> record, or the integer value of the name_code, like:
>
> >>> int(name_code)
> 1
>
> I'll assume it's the input index.
>
> I'll use Python's built-in "enumerate()" function. What it does is it add
> an index for each element of an iterator. For example, I can iterate
> through the characters of "ABCD" like this:
>
> >>> for c in "ABCD":
> ...   print(c)
> ...
> A
> B
> C
> D
>
> What enumerate() does is for each X in the input iterator, it returns
> (index, X)
>
> >>> for i, c in enumerate("ABCD"):
> ...   print(i, c)
> ...
> 0 A
> 1 B
> 2 C
> 3 D
>
> I can also specify the initial index, for example, to start at 1:
>
> >>> for i, c in enumerate("ABCD", 1):
> ...   print(i, c)
> ...
> 1 A
> 2 B
> 3 C
> 4 D
>
> The last bit to know is that pybel's "readfile" gives a way to iterate
> over all molecules in a file.
>
> >>> from openbabel import pybel
> >>> for i, mol in enumerate(pybel.readfile("smi", "wikipedia2.smi"), 1):
> ...   print("Entry#:", i, repr(mol.title))
> ...   if i == 10:
> ...     break
> ...
> Entry#: 1 'Ammonia'
> Entry#: 2 'Aspirin'
> Entry#: 3 'Acetylene'
> Entry#: 4 'Adenosine triphosphate'
> Entry#: 5 'Ampicillin'
> Entry#: 6 'Ascorbic acid'
> Entry#: 7 'Ascorbic acid'
> Entry#: 8 'Amphetamine'
> Entry#: 9 'Aspartame'
> Entry#: 10 'Amoxicillin'
>
> Finally, all the coordinates were 0.0. To make things a bit nicer, use the
> "make2D()" or "make3D()" methods to add 2D or 3D coordinates, respectively.
> Your example uses 3D, so I'll do that.
>
> Putting it all together, along with some use of Python's "argparse"
> molecule to handle command-line processing (which I won't discuss here)
> gives the "rjrich.py" program, attached.
>
> It's used like this:
>
>   % python rjrich.py test.smi
>
> You can also change the output tag, and the output file name, like this:
>
>   % python rjrich.py test.smi --tag Cacao2 -o cacao2.sdf
>
> (I believe if you're using Open Babel under Windows, with Python
> installed, then you should use "py" instead of "python" to run the program.)
>
> Cheers,
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>

_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Re: [Open Babel] Convert SMILES to SDF with a property from the name line?

Reply via email to