Le Tue, 28 Apr 2009 11:06:16 +0200, Marek spociń...@go2.pl, Poland <marek...@10g.pl> s'exprima ainsi:
> > Hello, > > > > The following code returns 'abc123abc45abc789jk'. How do I revise the > > pattern so that the return value will be 'abc789jk'? In other words, I > > want to find the pattern 'abc' that is closest to 'jk'. Here the string > > '123', '45' and '789' are just examples. They are actually quite > > different in the string that I'm working with. > > > > import re > > s = 'abc123abc45abc789jk' > > p = r'abc.+jk' > > lst = re.findall(p, s) > > print lst[0] > > I suggest using r'abc.+?jk' instead. > > the additional ? makes the preceeding '.+' non-greedy so instead of > matching as long string as it can it matches as short string as possible. Non-greedy repetition will not work in this case, I guess: from re import compile as Pattern s = 'abc123abc45abc789jk' p = Pattern(r'abc.+?jk') print p.match(s).group() ==> abc123abc45abc789jk (Someone explain why?) My solution would be to explicitely exclude 'abc' from the sequence of chars matched by '.+'. To do this, use negative lookahead (?!...) before '.': p = Pattern(r'(abc((?!abc).)+jk)') print p.findall(s) ==> [('abc789jk', '9')] But it's not exactly what you want. Because the internal () needed to express exclusion will be considered by findall as a group to be returned, so that you also get the last char matched in there. To avoid that, use non-grouping parens (?:...). This also avoids the need for parens around the whole format: p = Pattern(r'abc(?:(?!abc).)+jk') print p.findall(s) ['abc789jk'] Denis ------ la vita e estrany _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor