Good thing I read this explanation of regex Saturday morning ;-) ... I think?
Seems like matchtext/regex would be useful for HTML scraping strategies too. Thanks for the explanation Jim! John Patten SUSD Sent from my iPad On Jun 10, 2011, at 9:20 PM, Jim Ault <jimaultw...@yahoo.com> wrote: > > On Jun 10, 2011, at 7:39 PM, J. Landman Gay wrote: >> On 6/10/11 8:21 PM, Jim Ault wrote: >>> The () parens are telling the engine to capture any chars that meet the >>> conditions inside and assign them to the first variable specified. In >>> this case, it is 'retVal' >>> If there were a second set of (), then those chars would be assigned to >>> the second variable specified. >> Good explantion, I like when regex gets explained. But what I don't get is >> how come the first set of parentheses aren't put into the variable: >> >> get matchText(tEthernetConfig,"(?s)inet (.*?) ",retVal) >> >> The LC engine ignores the "(?s)". That's good and as it should be, but I'm >> not sure why. > > LC honors the (?s), but as a directive, not a caputure. > > > When a paren is read and is followed by a ? > this signals an 'operation' rather than a 'capture' > -- > Additional regex conditions or qualifiers are..... > Lookahead and Lookbehind ... scanning operations designated by > > (?<= (?<! lookbehind positive and negative logic > positive and negative logic lookahead (?= (?! > -- > (?Usi) means shortest match, allow multiple lines, disregard case > (?U) means shortest match, single line, case sensitive > (?s) means longest match, allow multiple lines, case sensitive > if it is missing then > default = means longest match, single line, case sensitive > > What is meant by 'single line' is that a return char restarts the scanning on > another line. > Multi line means the return is seen as just another char in the text block so > the repeat loops can keep going to find the longest match. > --- > > Think of the regex engine as a complex series of nested repeat loops that are > a combination of > repeat while > repeat until > making many, many char by char scans, in both directions, from both ends of a > block, to find the longest positive result, unless told to be ungreedy > (shortest result) > > The repeat loops are designed to accept strings and operators in series > such that a given block of text is scanned in both directions in order to > implement logic patterns. > > This multiple scanning can occur from the first char forward and the last > char backward to find the best solution. > > Simple rules don't show you all the multiple scans (repeat loops) that are > used to arrive at the parsed result. > Large blocks of text can take several minutes to scan depending on conditions > and conditionals. > > > The paren as a directive works for > [a-zA-Z] means a to z lower and upper for a single char > or > (?i)[a-z] means a to z lower and upper for a single char > --since the 'i' means case insensitive > (?i)([a-z]) -- will capture a single char if it is a-z either case. > If the test fails there is no value assigned. > LC will allow a test for empty, but Perl and others will report an error like > 'undefined' since there was no match, no capture, and no assignment. Also, > in Perl, etc, you must define variables ahead of the regex if you want to > avoid 'undefined'. > > Hope this makes a little light reading for the weekend. > > Jim Ault > Las Vegas > > > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode