Re: Extracting subsequences composed of the same character

MRAB Thu, 31 Mar 2011 18:21:21 -0700

On 01/04/2011 01:43, candide wrote:

Suppose you have a string, for instance


"pyyythhooonnn ---> ++++"

and you search for the subquences composed of the same character, here
you get :

'yyy', 'hh', 'ooo', 'nnn', '---', '++++'

It's not difficult to write a Python code that solves the problem, for
instance :

[snip]


I should confess that this code is rather cumbersome so I was looking
for an alternative. I imagine that a regular expressions approach could
provide a better method. Does a such code exist ? Note that the string
is not restricted to the ascii charset.


>>> import re
>>> re.findall(r"((.)\2+)", s)

[('yyy', 'y'), ('hh', 'h'), ('ooo', 'o'), ('nnn', 'n'), ('---', '-'),('++++', '+')]

>>> [m[0] for m in re.findall(r"((.)\2+)", s)]
['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
--
http://mail.python.org/mailman/listinfo/python-list

Re: Extracting subsequences composed of the same character

Reply via email to