Good thing I read this explanation of regex Saturday morning ;-) ... I think?

Seems like matchtext/regex would be useful for HTML scraping strategies too.

Thanks for the explanation Jim!

John Patten
SUSD

Sent from my iPad

On Jun 10, 2011, at 9:20 PM, Jim Ault <jimaultw...@yahoo.com> wrote:

> 
> On Jun 10, 2011, at 7:39 PM, J. Landman Gay wrote:
>> On 6/10/11 8:21 PM, Jim Ault wrote:
>>> The () parens are telling the engine to capture any chars that meet the
>>> conditions inside and assign them to the first variable specified. In
>>> this case, it is 'retVal'
>>> If there were a second set of (), then those chars would be assigned to
>>> the second variable specified.
>> Good explantion, I like when regex gets explained. But what I don't get is 
>> how come the first set of parentheses aren't put into the variable:
>> 
>> get matchText(tEthernetConfig,"(?s)inet (.*?) ",retVal)
>> 
>> The LC engine ignores the "(?s)". That's good and as it should be, but I'm 
>> not sure why.
> 
> LC honors the (?s), but as a directive, not a caputure.
> 
> 
> When a paren is read and is followed by a ?
> this signals an 'operation' rather than a 'capture'
> --
> Additional regex conditions or qualifiers are.....
> Lookahead and Lookbehind ... scanning operations designated by
> 
>                     (?<=   (?<!  lookbehind   positive and negative logic
> positive and negative logic   lookahead  (?=     (?!
> --
> (?Usi)  means shortest match, allow multiple lines, disregard case
> (?U) means shortest match, single line, case sensitive
> (?s) means longest match, allow multiple lines, case sensitive
> if it is missing then
> default = means longest match, single line, case sensitive
> 
> What is meant by 'single line' is that a return char restarts the scanning on 
> another line.
> Multi line means the return is seen as just another char in the text block so 
> the repeat loops can keep going to find the longest match.
> ---
> 
> Think of the regex engine as a complex series of nested repeat loops that are 
> a combination of
> repeat while
> repeat until
> making many, many char by char scans, in both directions, from both ends of a 
> block, to find the longest positive result, unless told to be ungreedy 
> (shortest result)
> 
> The repeat loops are designed to accept strings and operators in series
> such that a given block of text is scanned in both directions in order to 
> implement logic patterns.
> 
> This multiple scanning can occur from the first char forward and the last 
> char backward to find the best solution.
> 
> Simple rules don't show you all the multiple scans (repeat loops) that are 
> used to arrive at the parsed result.
> Large blocks of text can take several minutes to scan depending on conditions 
> and conditionals.
> 
> 
> The paren as a directive works for
> [a-zA-Z] means a to z lower and upper for a single char
> or
> (?i)[a-z]  means a to z lower and upper for a single char
> --since the 'i' means case insensitive
> (?i)([a-z]) -- will capture a single char if it is a-z either case.
> If the test fails there is no value assigned.
> LC will allow a test for empty, but Perl and others will report an error like 
> 'undefined' since there was no match, no capture, and no assignment.  Also, 
> in Perl, etc, you must define variables ahead of the regex if you want to 
> avoid 'undefined'.
> 
> Hope this makes a little light reading for the weekend.
> 
> Jim Ault
> Las Vegas
> 
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to