Maurice - Here is a pyparsing treatment of your problem. It is certainly more verbose, but hopefully easier to follow and later maintain (modifying valid word characters, for instance). pyparsing implicitly ignores whitespace, so tabs and newlines within the expression are easily skipped, without cluttering up the expression definition. The example also shows how to *not* match "<X> (<X>)" if inside a quoted string (in case this becomes a requirement).
Download pyparsing at http://pyparsing.sourceforge.net. -- Paul (replace leading '.'s with ' 's) from pyparsing import * LPAR = Literal("(") RPAR = Literal(")") # define a word as beginning with an alphabetic character followed by # zero or more alphanumerics, -, _, ., or $ characters word = Word(alphas, alphanums+"-_$.") targetExpr = word.setResultsName("first") + \ ............LPAR + word.setResultsName("second") + RPAR # this will match any 'word ( word )' arrangement, but we want to # reject matches if the two words aren't the same def matchWords(s,l,tokens): ....if tokens.first != tokens.second: ........raise ParseException(s,l,"") ....return tokens[0] targetExpr.setParseAction( matchWords ) testdata = """ This is (is) a match. This is (isn't) a match. I.B.M.\t\t\t(I.B.M. ) is a match. This is also a A.T.T. (A.T.T.) match. Paris in "the(the)" Spring( Spring ). """ print testdata print targetExpr.transformString(testdata) print "\nNow don't process ()'s inside quoted strings..." targetExpr.ignore(quotedString) print targetExpr.transformString(testdata) Prints out: This is (is) a match. This is (isn't) a match. I.B.M. (I.B.M. ) is a match. This is also a A.T.T. (A.T.T.) match. Paris in "the(the)" Spring( Spring ). This is a match. This is (isn't) a match. I.B.M. is a match. This is also a A.T.T. match. Paris in "the" Spring. Now don't process ()'s inside quoted strings... This is a match. This is (isn't) a match. I.B.M. is a match. This is also a A.T.T. match. Paris in "the(the)" Spring. -- http://mail.python.org/mailman/listinfo/python-list