On Tue, Oct 01, 2002 at 06:32:07PM -0400, Mike Lambert wrote: > > guaranteeing that the subsqls have all text up to, but not including the string > > "union". > > > > I suppose I could say: > > > > rule nonunion { (.*) :: { fail if ($1 =~ m"union$"); } } > > What's wrong with: ? > > rule getstuffbeforeunion { (.*?) union | (.*) } > > "a union" => "a " > "b" => "b" > > Am I missing something here? > > Mike Lambert >
hmm... well, it works, but its not very efficient. It basically scans the whole string to the end to see if there is a "union" string, and then backtracks to take the alternative. And hence, its not very scalable. It also doesn't 'complexify' very well. Suppose you had a long string of text, and you wanted to 'harden' your regex against the substring union appearing in double-quoted strings, single-quoted strings, etc. etc, without writing a sql parser. I just don't see how to do this with ? - I would do something like (taking a page from Mr. Friedl's book ) - rule regex_matching_sql { [ <-[u()"']>+ : | <parens> : | <double_string> : | <single_string> : | <non_union> ]* } rule parens { \( [ <-["'()]>+ : | <double_string> : | <single_string> : | <self> ]* \) } rule single_string { \' [ <-[\'\\]>+ : | \.\' ]* \' } rule double_string { \" [ <-[\"\\]>+ : | \.\" ]* \" } rule non_union { [ u < - ['"()n] > | un ... | uni ... | unio ... | u$ ] * } Of course I could also be missing something, but I just don't see how to do this with .*?. Ed (ps: As for: /(.*) <commit> <!{ $1 =~ rx{union} }>/ I'm not sure how that works; and whether or not its very 'complexifiable' (as per above) . If it does a match against every single substring (take all characters, look for union, if it exists, roll back a character, do the same thing, etc. etc. etc.) then this isn't good enough. The non_union rule listed above is about as efficient as it can get; it does no backtracking, and it keeps the common matches up front so they match first without alternation. )