Hi Spencer, Here's a solution which indirectly builds on the Open Babel functionality to find rings and identify components.
A direct algorithm would do as Noel suggest: use something like a depth-first search to find all ring atoms and from that figure out the number of connected ring systems. This round-about solution uses the SMARTS matcher to find the ring atoms (building on Open Babel's ring finder), delete the ring atoms, generate a SMILES (builds on Open Babel's component detection code), and counts the number of components as <number of '.' characters> + 1. It's slow, and uses some heavy-weight mechanisms, but I think it gets the answer you are looking for. There may be more efficient ways to, for example, count the number of components. I haven't really tested it though. It also works because your definition says that rings connected by a sprio atom are part of the same ring system. If they were parts of different systems then the implementation would be more difficult. Andrew da...@dalkescientific.com
from __future__ import print_function import openbabel as ob import pybel ## General algorithm: # 1. Find all atoms which are not in a ring # 2. Delete them, so only ring systems are left. # 3. Create a SMILES # 4. Count the number of ".", which will be one less # then the number of ring systems. _not_ring_pattern = ob.OBSmartsPattern() if not _not_ring_pattern.Init("[!R]"): raise AssertionError("how can this be invalid?") _smi_conversion = ob.OBConversion() if not _smi_conversion.SetOutFormat("smi"): raise AssertionError("SMILES not available?") def count_ring_systems(mol): # Work with a copy of the molecule because this # method will delete atoms. mol = ob.OBMol(mol) # Find the non-ring atom indices. # This builds on Open Babel's internal ring finder. _not_ring_pattern.Match(mol) indices = [] for (not_ring_idx,) in _not_ring_pattern.GetUMapList(): indices.append(not_ring_idx) # No ring atoms -> no ring system if len(indices) == mol.NumAtoms(): return 0 # Delete the non-ring atoms. # (Spiro rings are considered part of the same ring system.) indices.sort(reverse=True) # Go in reverse order so the indices aren't affected. # (I could also have done all of the GetAtom() calls first # which are then used to delete the atoms.) for idx in indices: mol.DeleteAtom(mol.GetAtom(idx)) # Generate the SMILES and count the number of components. out_smi = _smi_conversion.WriteString(mol) # Each component is separated by a "." num_disconnects = out_smi.count(".") # This slow solution builds on Open Babel's internal methods to find # individual components. There may be an API call to find the number # of components directly. return num_disconnects + 1 def get_argv_parser(): import argparse parser = argparse.ArgumentParser( description = "count the number of ring systems in a SMILES file") parser.add_argument( "--smiles", default=None, help="specify a SMILES string instead of reading from a file") parser.add_argument( "--in", "-i", default="smi", dest="input_format", help="input file format (default: 'smi')") parser.add_argument( "filename", default=None, nargs="?", help="structure file to process") return parser def main(argv=None): parser = get_argv_parser() args = parser.parse_args(argv) if args.smiles: if args.filename: parser.error("Cannot specify both --smiles and a filename") mol = pybel.readstring("smi", args.smiles) print(count_ring_systems(mol.OBMol), "", sep="\t") else: filename = args.filename if filename is None: filename = "/dev/stdin" # Only works on Unix-like systems reader = pybel.readfile(args.input_format, filename) for mol in reader: count = count_ring_systems(mol.OBMol) title = mol.title for special in "\r\n\t": if special in title: title = title.replace(special, "") print(count, title, sep="\t") if __name__ == "__main__": import sys main(sys.argv[1:])
_______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss