When I come across a good regex example like the one you provided, if I have a moment I like to test things out to see where regex is faster and where it isn't. It's really great for many things, but carries quite a bit of overhead.
Of course for this test to be relevant it assumes that most of the specifiers in the regex expression are merely to identify the elements you're looking for, and that the data is expected to fit the definition you provided.
Given that, it's possible to make the regex a bit simpler (see foo2 below), but only with a modest boost to performance. It can probably be simplified more, but the chunk-based alternative performed so well I didn't bother exploring the regex side any further.
Writing a lengthier handler that uses chunk expressions seems to yield the same results you reported, running between 12 and 60 times faster (depending on the percentage of lines tested that match the criteria being looked for).
For one-offs like validating email addresses regex can be an excellent fit, and even some larger tasks depending on the specifics.
But for iterating across lists I've often been delightfully surprised by LiveCode's gracefully efficient chunk handling.
Testing your original data replicated to become 250 lines long, and looking for page 1 among them, the script below yields:
Regex: 9261 ms RegexLite: 7958 ms Chunks: 197 ms Chunks faster than orig regex by: 47.01 times Chunks faster than lite regex by: 40.4 times Same result? true on mouseUp put fld 1 into tList put 1 into tPage --< change this for different tests put 1000 into n -- -- Test 1: original regex put the millisecs into t repeat n put foo1(tPage, tList) into r1 end repeat put the millisecs - t into t1 -- -- Test 2: lighter regex put the millisecs into t repeat n put foo2(tPage, tList) into r2 end repeat put the millisecs - t into t2 -- -- Test 3: chunks put the millisecs into t repeat n put foo3(tPage, tList) into r3 end repeat put the millisecs - t into t3 -- -- Display results: set the numberformat to "0.##" put "Regex: "&t1 &" ms"&cr \ &"RegexLite: "&t2 &" ms"&cr \ &"Chunks: "& t3 &" ms"&cr \ &"Chunks faster than orig regex by: "&(t1 / t3)&" times" &cr \ &"Chunks faster than lite regex by: "&(t2 / t3)&" times" &cr \ &"Same result? "& (r1=r3) &cr&cr& r1 &cr&cr& r3 end mouseUp function foo1 pPage, tListput "(.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)" into tMatchPattern
filter lines of tList with regex pattern tMatchPattern return tList end foo1 function foo2 pPage, tListput "(.+\t"&pPage&",*)|(.+\t\d+,\d+,"&pPage&",*)|(.+\t"&pPage&",*)" into tMatchPattern
filter lines of tList with regex pattern tMatchPattern return tList end foo2 function foo3 pPage, tList repeat for each line tLine in tList set the itemdel to tab put item 3 of tLine into t1 put pPage &"," into tPageMarker if "." is in t1 then if (t1 begins with tPageMarker) then put tLine &cr after tNuList end if elseif ( t1 begins with tPageMarker) OR (item 4 of tLine begins with tPageMarker) then
put tLine &cr after tNuList end if end if end repeat delete last char of tNuList return tNuList end foo3 -- Richard Gaskin Fourth World Systems Software Design and Development for the Desktop, Mobile, and the Web ____________________________________________________________________ ambassa...@fourthworld.com http://www.FourthWorld.com Paul Dupuis wrote:
Never mind. Solved it. It was the pattern for the 2nd format. Fixed with "(.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)" On 1/30/2016 3:17 PM, Paul Dupuis wrote:I need some regex help. I have a list that is of the form: <number><tab><text><tab><numberCol1><tab><numberCol2> i.e. 1 Testing 1,747 1,1,1,747 2 Testing 752,1800 1,752,1,1800 3 Testing 5398,5846 2,320,2,768 4 Testing 3,111.951,683.915,302.268,385.751 3,111.951,683.915,302.268,385.751 <numberCol2> can have a list of number in 1 of 2 formats: A comma separated list of 4 integers, i.e. <integer1>,<integer2>,<integer3>,<integer4> OR A comma separated list of 1 integer, followed by 4 decimal numbers, i.e. <integer>,<decimal>,<decimal>,<decimal>,<decimal> I need filter the lines of this list with a REGEX pattern to get lines WHERE a value pPage matches certain places in <numberCol2>, specifically: where pPage is equal to either <integer1> or <integer3> in the first format(i.e. item 1 or item 3) OR where pPage is equal to <integer> in the second format(i.e. item 1) So my code is: put "((.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+))" into tMatchPattern filter lines of tList with regex pattern tMatchPattern If pPage is 1 then I should get: 1 Testing 1,747 1,1,1,747 2 Testing 752,1800 1,752,1,1800 and I do. If pPage is 2 then I should get: 3 Testing 5398,5846 2,320,2,768 and I do. If pPage is 3 then I should get: 4 Testing 3,111.951,683.915,302.268,385.751 3,111.951,683.915,302.268,385.751 and I do. if pPage is 4 then I should get and empty list, and I do, but when pPage is 5, I am expecting an empty list and I get 3 Testing 5398,5846 2,320,2,768 So something is wrong with my Regex, but I can not figure out what? It looks like it is matching against <numberCol1> in the last case (pPage=5) but it should not since there are only 2 items in the list rather than 4 or 5. I am using LiveCode 6.7.6
_______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode