I just can't resist a good benchmark / coding challenge.
Summary - for small data (10s of thousand of lines), they're all fairly
quick; for large data sets, there are big performance differences and
how far through the data set the "start-with-numeric" lines begin.
I used 10 million lines of data, with 'numeric' staring at either line
10,000 or 9,999,000
simple repeat loop - clear, easy to understand, relatively slow but
times vary widely
filter without ... - fairly clear, generally a little bit quicker,
consistent times
binary search using character offset - hard to understand, MUCH faster,
consistent times
specifically
repeat between 4 and 3809 ms
filter between 2047 and 3602
binary between 0 and 230
For the binary search, the coding is slightly tricky. You can avoid some
of the edge cases by making an assumption that there is a max length for
any line (I assumed 5,000 characters, see kBrute) - but if you can't
make any such assumption, then you can handle it - just need to handle a
couple of special cases. I've left that as an exercise :-)
Here's the code I used (the data is in global gData)
global gData
on mouseup
local t1, t2, K
put the millisecs into t1
put f1(gData) into t2
put t2 && the millisecs - t1 & CR after msg
put the millisecs into t1
put f2(gData) into t2
put t2 && the millisecs - t1 & CR after msg
put the millisecs into t1
put f3(gData) into t2
put t2 && the millisecs - t1 & CR after msg
end mouseup
function f1 @pD
local r
repeat for each line L in pD
if char 1 of L is a number then exit repeat
add 1 to r
end repeat
return r + 1
end f1
function f2 pD
filter pD without "[0-9]*"
return the number of lines in pD + 1
end f2
function f3 @pD
local r, tLower, tUpper, tMid, c, t, temp
constant kBrute = 10000
put 1 into tLower
put (the number of chars in pD) + 1 into tUpper
repeat 10000 times -- more than enough :-)
if tUpper - tLower < kBrute then exit repeat
put (tUpper + tlower ) / 2 into tMid
put char 1 of line 2 of (char tMid to tUpper of pD) into c
if c is a number then
put tMid into tUpper
else
put tMid into tLower
end if
end repeat
put char tLower to tUpper of pD into temp
filter temp without "[0-9]*"
if temp is empty then
-- no lines start with numeric
return "No lines"
else
put the number of lines in (char 1 to tLower of pD) into t
add the number of lines in temp to t
return t
end if
end f3
-- Alex.
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode