Re: [Open Babel] calculate chemical descriptors from published paper

Andrew Dalke Tue, 14 May 2019 10:16:25 -0700

Hi Spencer,

  Here's a solution which indirectly builds on the Open Babel functionality to 
find rings and identify components.


A direct algorithm would do as Noel suggest: use something like a depth-first 
search to find all ring atoms and from that figure out the number of connected 
ring systems.

This round-about solution uses the SMARTS matcher to find the ring atoms 
(building on Open Babel's ring finder), delete the ring atoms, generate a 
SMILES (builds on Open Babel's component detection code), and counts the number 
of components as <number of '.' characters> + 1.

It's slow, and uses some heavy-weight mechanisms, but I think it gets the 
answer you are looking for. There may be more efficient ways to, for example, 
count the number of components.

I haven't really tested it though.

It also works because your definition says that rings connected by a sprio atom 
are part of the same ring system. If they were parts of different systems then 
the implementation would be more difficult.


                                Andrew
                                da...@dalkescientific.com

from __future__ import print_function

import openbabel as ob
import pybel

## General algorithm:
# 1. Find all atoms which are not in a ring
# 2. Delete them, so only ring systems are left.
# 3. Create a SMILES
# 4. Count the number of ".", which will be one less
#   then the number of ring systems.

_not_ring_pattern = ob.OBSmartsPattern()
if not _not_ring_pattern.Init("[!R]"):
    raise AssertionError("how can this be invalid?")


_smi_conversion = ob.OBConversion()
if not _smi_conversion.SetOutFormat("smi"):
    raise AssertionError("SMILES not available?")

def count_ring_systems(mol):
    # Work with a copy of the molecule because this
    # method will delete atoms.
    mol = ob.OBMol(mol)

    # Find the non-ring atom indices.
    # This builds on Open Babel's internal ring finder.
    _not_ring_pattern.Match(mol)
    
    indices = []
    for (not_ring_idx,) in _not_ring_pattern.GetUMapList():
        indices.append(not_ring_idx)

    # No ring atoms -> no ring system
    if len(indices) == mol.NumAtoms():
        return 0

    # Delete the non-ring atoms.
    # (Spiro rings are considered part of the same ring system.)
    indices.sort(reverse=True)

    # Go in reverse order so the indices aren't affected.
    # (I could also have done all of the GetAtom() calls first
    # which are then used to delete the atoms.)
    for idx in indices:
        mol.DeleteAtom(mol.GetAtom(idx))

    # Generate the SMILES and count the number of components.
    out_smi = _smi_conversion.WriteString(mol)
    # Each component is separated by a "."
    num_disconnects = out_smi.count(".")

    # This slow solution builds on Open Babel's internal methods to find
    # individual components. There may be an API call to find the number
    # of components directly.
    return num_disconnects + 1

def get_argv_parser():
    import argparse
    parser = argparse.ArgumentParser(
        description = "count the number of ring systems in a SMILES file")
    parser.add_argument(
        "--smiles", default=None,
        help="specify a SMILES string instead of reading from a file")
    parser.add_argument(
        "--in", "-i", default="smi", dest="input_format",
        help="input file format (default: 'smi')")
    parser.add_argument(
        "filename", default=None, nargs="?",
        help="structure file to process")
    return parser

def main(argv=None):
    parser = get_argv_parser()
    args = parser.parse_args(argv)

    if args.smiles:
        if args.filename:
            parser.error("Cannot specify both --smiles and a filename")
        mol = pybel.readstring("smi", args.smiles)
        print(count_ring_systems(mol.OBMol), "", sep="\t")
    else:
        filename = args.filename
        if filename is None:
            filename = "/dev/stdin" # Only works on Unix-like systems
        reader = pybel.readfile(args.input_format, filename)
        for mol in reader:
            count = count_ring_systems(mol.OBMol)
            
            title = mol.title
            for special in "\r\n\t":
                if special in title:
                    title = title.replace(special, "")
                    
            print(count, title, sep="\t")

if __name__ == "__main__":
    import sys
    main(sys.argv[1:])

_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Re: [Open Babel] calculate chemical descriptors from published paper

Reply via email to