A brute-force pyparsing approach - define an alternation of all
possible Words made up of the same letter.
Plus an alternate version that just picks out the repeats, and gives
their location in the input string:

from pyparsing import ZeroOrMore, MatchFirst, Word, alphas

print "group string by character repeats"
repeats = ZeroOrMore( MatchFirst( [ Word(a) for a in alphas ] ) )
test = "foo ooobaaazZZ"
print repeats.parseString(test)
print

print "find just the repeated characters"
repeats = MatchFirst( [ Word(a,min=2) for a in alphas ] )
test = "foo ooobaaazZZ"
for toks,loc,endloc in repeats.scanString(test):
    print toks,loc

Gives:
group string by character repeats
['f', 'oo', 'ooo', 'b', 'aaa', 'z', 'ZZ']

find just the repeated characters
['oo'] 1
['ooo'] 4
['aaa'] 8
['ZZ'] 12

(pyparsing implicitly ignores whitespace, that's why there is no ' '
entry in the first list)

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to