Maurice LING wrote: > Matt wrote: > > I'd HIGHLY suggest purchasing the excellent <a > > href="http://www.oreilly.com/catalog/regex2/index.html">Mastering > > Regular Expressions</a> by Jeff Friedl. Although it's mostly geared > > towards Perl, it will answer all your questions about regular > > expressions. If you're going to work with regexs, this is a must-have. > > > > That being said, here's what the new regular expression should be with > > a bit of instruction (in the spirit of teaching someone to fish after > > giving them a fish ;-) ) > > > > my_expr = re.compile(r'(\w+)\s*(\(\1\))') > > > > Note the "\s*", in place of the single space " ". The "\s" means "any > > whitespace character (equivalent to [ \t\n\r\f\v]). The "*" following > > it means "0 or more occurances". So this will now match: > > > > "there (there)" > > "there (there)" > > "there(there)" > > "there (there)" > > "there\t(there)" (tab) > > "there\t\t\t\t\t\t\t\t\t\t\t\t(there)" > > etc. > > > > Hope that's helpful. Pick up the book! > > > > M@ > > > > Thanks again. I've read a number of tutorials on regular expressions but > it's something that I hardly used in the past, so gone far too rusty. > > Before my post, I've tried > my_expr = re.compile(r'(\w+) \s* (\(\1\))') instead but it doesn't work, > so I'm a bit stumped...... > > Thanks again, > Maurice
Maurice, The reason your regex failed is because you have spaces around the "\s*". This translates to "one space, followed by zero or more whitespace elements, followed by one space". So your regex would only match the two text elements separated by at least 2 spaces. This kind of demostrates why regular expressions can drive you nuts. I still suggests picking up the book; not because Jeff Friedl drove a dump truck full of money up to my door, but because it specifically has a use case like yours. So you get to learn & solve your problem at the same time! HTH, M@ -- http://mail.python.org/mailman/listinfo/python-list