> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:python- > [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] > Sent: Tuesday, February 05, 2008 9:31 AM > To: python-list@python.org > Subject: Using Regular Expressions to Parse SQL > > > My pattern does not even come close. > > Any help would be greatly appreciated. My goal is to analyse a large > number of SQL querys to try to identify the join field and see where > indexing might make sense. > > While I am mostly interested in understanding regular expressions, I > would also be interested in knowing about any Python SQL parsers out > there.
Python's regexes are a tad odd compared to Perl (or Perl's regexes are odd. Relativity.) You need re.DOTALL to handle newlines in the sql: DOTALL Make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline. import re s= ''' FROM ((qry_Scores_Lookup1 INNER JOIN CSS_Rpt1 ON (qry_Scores_Lookup1.desc = CSS_Rpt1.desc) AND (qry_Scores_Lookup1.lastcdu = CSS_Rpt1.lastcdu))''' pat = r"(^|\s+)FROM\s.+\s(?:INNER|LEFT|RIGHT)\s+JOIN\s.+\sON\s+((\s*(?:\S+)\s* =\s*(?:\S+))(?:\s+|\s+AND\s+|$))+" m = re.compile(pat, re.DOTALL).match(s) if m: print m ('(qry_Scores_Lookup1.desc = CSS_Rpt1.desc) ', '(qry_Scores_Lookup1.desc = CSS_Rpt1.desc)') You should be able to tweak the regex to get the exact matches you want. An alternative to writing a lengthy regex might be to just grab everything after the ON and then string split on "AND". ***** The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA623 -- http://mail.python.org/mailman/listinfo/python-list