Suppose you have a string, for instance

"pyyythhooonnn ---> ++++"

and you search for the subquences composed of the same character, here you get :

'yyy', 'hh', 'ooo', 'nnn', '---', '++++'

It's not difficult to write a Python code that solves the problem, for instance :

def f(text):
    ch=text
    r=[]
    if not text:
        return r
    else:
        x=ch[0]
        i=0
        for c in ch:
            if c!=x:
                if i>1:
                    r+=[x*i]
                x=c
                i=1
            else:
                i+=1
    return r+(i>1)*[i*x]

print f("pyyythhooonnn ---> ++++")


I should confess that this code is rather cumbersome so I was looking for an alternative. I imagine that a regular expressions approach could provide a better method. Does a such code exist ? Note that the string is not restricted to the ascii charset.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to