Regex is wonderfully compact to write relative to equivalent routines using chunk expressions, but sometimes paid for in execution time.

When I come across a good regex example like the one you provided, if I have a moment I like to test things out to see where regex is faster and where it isn't. It's really great for many things, but carries quite a bit of overhead.

Of course for this test to be relevant it assumes that most of the specifiers in the regex expression are merely to identify the elements you're looking for, and that the data is expected to fit the definition you provided.

Given that, it's possible to make the regex a bit simpler (see foo2 below), but only with a modest boost to performance. It can probably be simplified more, but the chunk-based alternative performed so well I didn't bother exploring the regex side any further.

Writing a lengthier handler that uses chunk expressions seems to yield the same results you reported, running between 12 and 60 times faster (depending on the percentage of lines tested that match the criteria being looked for).

For one-offs like validating email addresses regex can be an excellent fit, and even some larger tasks depending on the specifics.

But for iterating across lists I've often been delightfully surprised by LiveCode's gracefully efficient chunk handling.

Testing your original data replicated to become 250 lines long, and looking for page 1 among them, the script below yields:

Regex: 9261 ms
RegexLite: 7958 ms
Chunks: 197 ms
Chunks faster than orig regex by: 47.01 times
Chunks faster than lite regex by: 40.4 times
Same result? true


on mouseUp
  put fld 1 into tList
  put 1 into tPage --< change this for different tests
  put 1000 into n
  --
  -- Test 1: original regex
  put the millisecs into t
  repeat n
    put foo1(tPage, tList) into r1
  end repeat
  put the millisecs - t into t1
  --
  -- Test 2: lighter regex
  put the millisecs into t
  repeat n
    put foo2(tPage, tList) into r2
  end repeat
  put the millisecs - t into t2
  --
  -- Test 3: chunks
  put the millisecs into t
  repeat n
    put foo3(tPage, tList) into r3
  end repeat
  put the millisecs - t into t3
  --
  -- Display results:
  set the numberformat to "0.##"
  put "Regex: "&t1 &" ms"&cr \
        &"RegexLite: "&t2 &" ms"&cr \
        &"Chunks: "& t3 &" ms"&cr \
        &"Chunks faster than orig regex by: "&(t1 / t3)&" times" &cr \
        &"Chunks faster than lite regex by: "&(t2 / t3)&" times" &cr \
        &"Same result? "& (r1=r3) &cr&cr& r1 &cr&cr& r3
end mouseUp


function foo1 pPage, tList
put "(.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)" into tMatchPattern
  filter lines of tList with regex pattern tMatchPattern
  return tList
end foo1


function foo2 pPage, tList
put "(.+\t"&pPage&",*)|(.+\t\d+,\d+,"&pPage&",*)|(.+\t"&pPage&",*)" into tMatchPattern
  filter lines of tList with regex pattern tMatchPattern
  return tList
end foo2



function foo3 pPage, tList
  repeat for each line tLine in tList
    set the itemdel to tab
    put item 3 of tLine into t1
    put pPage &"," into tPageMarker
    if "." is in t1 then
      if (t1 begins with tPageMarker) then
        put tLine &cr after tNuList
      end if
    else
if ( t1 begins with tPageMarker) OR (item 4 of tLine begins with tPageMarker) then
        put tLine &cr after tNuList
      end if
    end if
  end repeat
  delete last char of tNuList
  return tNuList
end foo3










--
 Richard Gaskin
 Fourth World Systems
 Software Design and Development for the Desktop, Mobile, and the Web
 ____________________________________________________________________
 ambassa...@fourthworld.com                http://www.FourthWorld.com


Paul Dupuis wrote:
Never mind. Solved it.

It was the pattern for the 2nd format. Fixed with
"(.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)"

On 1/30/2016 3:17 PM, Paul Dupuis wrote:
I need some regex help.

I have a list that is of the form:
<number><tab><text><tab><numberCol1><tab><numberCol2>
i.e.
1    Testing    1,747    1,1,1,747
2    Testing    752,1800    1,752,1,1800
3    Testing    5398,5846    2,320,2,768
4    Testing    3,111.951,683.915,302.268,385.751
 3,111.951,683.915,302.268,385.751

<numberCol2> can have a list of number in 1 of 2 formats:
A comma separated list of 4 integers, i.e.
<integer1>,<integer2>,<integer3>,<integer4>
OR
A comma separated list of 1 integer, followed by 4 decimal numbers, i.e.
<integer>,<decimal>,<decimal>,<decimal>,<decimal>

I need filter the lines of this list with a REGEX pattern to get lines
WHERE a value pPage matches certain places in <numberCol2>, specifically:
where pPage is equal to either <integer1> or <integer3> in the first
format(i.e. item 1 or item 3)
OR
where pPage is equal to <integer> in the second format(i.e. item 1)

So my code is:
put
"((.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+))"
into tMatchPattern
filter lines of tList with regex pattern tMatchPattern

If pPage is 1 then I should get:
1    Testing    1,747    1,1,1,747
2    Testing    752,1800    1,752,1,1800
and I do. If pPage is 2 then I should get:
3    Testing    5398,5846    2,320,2,768
and I do. If pPage is 3 then I should get:
4    Testing    3,111.951,683.915,302.268,385.751
 3,111.951,683.915,302.268,385.751
and I do. if pPage is 4 then I should get and empty list, and I do, but
when pPage is 5, I am expecting an empty list and I get
3    Testing    5398,5846    2,320,2,768

So something is wrong with my Regex, but I can not figure out what? It
looks like it is matching against <numberCol1> in the last case
(pPage=5) but it should not since there are only 2 items in the list
rather than 4 or 5.

I am using LiveCode 6.7.6



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to