On Oct 20, 7:07 am, Peter Otten <[EMAIL PROTECTED]> wrote: > Alfons Nonell-Canals wrote: > > I have a trouble and I don't know how to solve it. I am working with > > molecules and each molecule has a number of atoms. I obtain each atom > > spliting the molecule. > > > Ok. It is fine and I have no problem with it. > > > The problem is when I have to work with these atoms. These atoms usually > > are only a letter but, sometimes it can also contain one o more numbers. > > If they contein a number I have to manipulate them separately. > > > If the number was allways the same I know how to identify them, for > > example, 1: > > > atom = 'C1' > > > if '1' in atom: > > print 'kk' > > > But, how can I do to identify in '1' all possibilities from 1-9, I > > tried: > > > if '[1-9]', \d,... > > > Any comments, please? > > http://mail.python.org/pipermail/tutor/1999-March/000083.html > > Peter- Hide quoted text - > > - Show quoted text -
Wow, that sure is a lot of code. And I'm not sure the OP wants to delve into re's just to solve this problem. Here is the pyparsing rendition (although it does not handle the recursive computation of submolecules given in parens, as the Tim Peters link above does): http://pyparsing.wikispaces.com/file/view/chemicalFormulas.py The pyparsing version defines chemical symbols and their coefficients as using the following code: caps = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" lowers = caps.lower() digits = "0123456789" element = Word( caps, lowers ) integer = Word( digits ) elementRef = Group( element + Optional( integer, default="1" ) ) chemicalFormula = OneOrMore( elementRef ) Then to parse a formula like C6H5OH, there is no need to code up a tokenizer, just call parseString: elements = chemicalFormula.parseString("C6H5OH") The URL above links to a better annotated example, included 2 more extended versions that show how to use the resulting parsed data to compute the molecular weight of the chemical. -- Paul -- http://mail.python.org/mailman/listinfo/python-list