[EMAIL PROTECTED] wrote: > I want to match a word against a string such that 'peter' is found in > "peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or > "hey peterbe," because the word has to stand on its own. The following > code works for a single word: > > def createStandaloneWordRegex(word): > """ return a regular expression that can find 'peter' only if it's > written > alone (next to space, start of string, end of string, comma, etc) > but > not if inside another word like peterbe """ > return re.compile(r""" > ( > ^ %s > (?=\W | $) > | > (?<=\W) > %s > (?=\W | $) > ) > """% (word, word), re.I|re.L|re.M|re.X) > > > def test_createStandaloneWordRegex(): > def T(word, text): > print createStandaloneWordRegex(word).findall(text) > > T("peter", "So Peter Bengtsson wrote this") > T("peter", "peter") > T("peter bengtsson", "So Peter Bengtsson wrote this") > > The result of running this is:: > > ['Peter'] > ['peter'] > [] <--- this is the problem!! > > > It works if the parameter is just one word (eg. 'peter') but stops > working when it's an expression (eg. 'peter bengtsson')
No, not when it's an "expression" (whatever that means), but when the parameter contains whitespace, which is ignored in verbose mode. > > How do I modify my regular expression to match on expressions as well > as just single words?? > If you must stick with re.X, you must escape any whitespace characters in your "word" -- see re.escape(). Alternatively (1), drop re.X but this is ugly: regex_text_no_X = r"(^%s(?=\W|$)|(?<=\W)%s(?=\W|$))" % (word, word) Alternatively (2), consider using the \b gadget; this appears to give the same answers as the baroque method: regex_text_no_flab = r"\b%s\b" % word HTH, John -- http://mail.python.org/mailman/listinfo/python-list