Thus spoke Kay Schluehr (on 2006-06-18 19:07): > I have a list of strings ls = [s_1,s_2,...,s_n] and want to create a > regular expression sx from it, such that sx.match(s) yields a SRE_Match > object when s starts with an s_i for one i in [0,...,n]. There might > be relations between those strings: s_k.startswith(s_1) -> True or > s_k.endswith(s_1) -> True. An extreme case would be ls = ['a', 'aa', > ...,'aaaa...ab']. For this reason SRE_Match should provide the longest > possible match.
With some ideas from Kay and Paddy, it tried to get along with Python in doing this. If its allowed to spread the individual strings into alterations, the following would imho do: #!/usr/bin/python # -*- coding: iso-8859-15 -*- text = r'this is a text containing aaaöüöaaaµaaa and more'; lstr = [ 'a', 'aa', 'aaaaa', 'aaaöüöaaaµaaa', 'aaaaaaaaaaaaaaab' ] import re pat = re.compile( \ '(' + \ '|'.join(sorted(lstr,lambda a,b: len(b)-len(a))) + \ ')', re.VERBOSE); hits = sorted( pat.findall(text), lambda a,b: len(b)-len(a) ) print 'Longest: ', hits[0] This will print 'aaaöüöaaaµaaa' from the text and won't complain about specuial characters used. in Perl, you could build up the regex by dynamic evaluation (??{..}), but I didn't manage to get this working in Python, so here is (in Perl) how I thougt it would work: my $text = "this is a text containing aaaöüöaaaµaaa and more"; my @lstr = ( 'a', 'aa', 'aaaaa', 'aaaöüöaaaµaaa', 'aaaaaaaaaaaaaaab', ); my $re = qr{ (??{ join '|', map { quotemeta } sort{ length $b <=> length $a } @lstr }) }x; $_ = $text; print "Longest: ", (my $hit) = reverse sort /$re/g; Maybe the experts can bring some light to it. Regards Mirco -- http://mail.python.org/mailman/listinfo/python-list