Hi Andrew, Many thanks for your detailed answer!
Yes, I use Python. I will give your suggested solution a try. Best regards, Rudy On Sun, Feb 27, 2022 at 4:27 PM Andrew Dalke <da...@dalkescientific.com> wrote: > Hi Rudy, > > > > > On Feb 27, 2022, at 20:55, Rudy Richardson <rjr...@umich.edu> wrote: > > > > I have a library of ~1000 compounds as SMILES strings with an appended > name code and a property. For example: > > > > c1ccc(c2ccccc2)cc1 0001 -2.52 > > > > Where "0001" is the name code and "-2.52" is a physicochemical property > of the molecule. > > > > I would like to convert these strings to a concatenated SDF file, > > > If you're comfortable working with Python, here's an example using the > pybel interface. > > First, here's how to get the name code and property > > >>> from openbabel import pybel > >>> mol = pybel.readstring("smi", "c1ccc(c2ccccc2)cc1\t0001\t-2.52") > >>> mol > <openbabel.pybel.Molecule object at 0x1101dece0> > >>> mol.title > '0001\t-2.52' > > In that case I used tabs (represented as "\t"), because I believe that's > what's in your file. That would explain the extra space between the fields. > > I'll use Python's string.split() to split on any whitespace (which > includes both spaces and tabs) > > >>> mol.title.split() > ['0001', '-2.52'] > > and assign them to the variables "name_code" and "value". > > >>> name_code, value = mol.title.split() > > The pybel API has a "write()" method on molecules which formats it into a > given format. Here's what it looks like in "sdf". > > >>> print(mol.write("sdf")) > 0001 -2.52 > OpenBabel02272221522D > > 12 13 0 0 0 0 0 0 0 0999 V2000 > ... > > I need to change the title and add a "logX" data item, which I can do with: > > >>> mol.title = name_code > >>> mol.data["logX"] = value > > giving most of what you wanted. > > >>> print(mol.write("sdf")) > 0001 > OpenBabel02272221552D > > 12 13 0 0 0 0 0 0 0 0999 V2000 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 12 2 0 0 0 0 > 1 2 1 0 0 0 0 > 2 3 2 0 0 0 0 > 3 4 1 0 0 0 0 > 4 5 1 0 0 0 0 > 4 11 2 0 0 0 0 > 5 10 2 0 0 0 0 > 5 6 1 0 0 0 0 > 6 7 2 0 0 0 0 > 7 8 1 0 0 0 0 > 8 9 2 0 0 0 0 > 9 10 1 0 0 0 0 > 11 12 1 0 0 0 0 > M END > > <logX> > -2.52 > > $$$$ > > You also had a "No." field. I don't know if that is the index of the input > record, or the integer value of the name_code, like: > > >>> int(name_code) > 1 > > I'll assume it's the input index. > > I'll use Python's built-in "enumerate()" function. What it does is it add > an index for each element of an iterator. For example, I can iterate > through the characters of "ABCD" like this: > > >>> for c in "ABCD": > ... print(c) > ... > A > B > C > D > > What enumerate() does is for each X in the input iterator, it returns > (index, X) > > >>> for i, c in enumerate("ABCD"): > ... print(i, c) > ... > 0 A > 1 B > 2 C > 3 D > > I can also specify the initial index, for example, to start at 1: > > >>> for i, c in enumerate("ABCD", 1): > ... print(i, c) > ... > 1 A > 2 B > 3 C > 4 D > > The last bit to know is that pybel's "readfile" gives a way to iterate > over all molecules in a file. > > >>> from openbabel import pybel > >>> for i, mol in enumerate(pybel.readfile("smi", "wikipedia2.smi"), 1): > ... print("Entry#:", i, repr(mol.title)) > ... if i == 10: > ... break > ... > Entry#: 1 'Ammonia' > Entry#: 2 'Aspirin' > Entry#: 3 'Acetylene' > Entry#: 4 'Adenosine triphosphate' > Entry#: 5 'Ampicillin' > Entry#: 6 'Ascorbic acid' > Entry#: 7 'Ascorbic acid' > Entry#: 8 'Amphetamine' > Entry#: 9 'Aspartame' > Entry#: 10 'Amoxicillin' > > Finally, all the coordinates were 0.0. To make things a bit nicer, use the > "make2D()" or "make3D()" methods to add 2D or 3D coordinates, respectively. > Your example uses 3D, so I'll do that. > > Putting it all together, along with some use of Python's "argparse" > molecule to handle command-line processing (which I won't discuss here) > gives the "rjrich.py" program, attached. > > It's used like this: > > % python rjrich.py test.smi > > You can also change the output tag, and the output file name, like this: > > % python rjrich.py test.smi --tag Cacao2 -o cacao2.sdf > > (I believe if you're using Open Babel under Windows, with Python > installed, then you should use "py" instead of "python" to run the program.) > > Cheers, > > Andrew > da...@dalkescientific.com > >
_______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss