There may well be a bug in my code - I wrote it between midnight and 1am
last night, tested it on 50 random cases, and let it be free :-) I'll
look over it carefully this morning and see if I find anything.
But no, there isn't a problem with many short lines. We have a block of
data where the first instance of a leading digit (if there is one)
occurs; that block is (inclusively) between tLower and tUpper. We don't
care how many lines there are in that block -only that the first leading
digit is within it. There's no need to decrease kBrute to deal with
short lines, because we still do a full search (in this case using
filter) when we get down to a block of size kBrute.
It would actually be more efficient to decrease kBrute, because then
we'd do the efficient binary reduction further, until we switch to the
slower - but easier - brute force method. The purpose of kBrute is to
avoid a couple of corner cases that I didn't want to think through :-)
When we calculate a new tMid, we then need to find the start of a line
near to it, so that we can decide whether to adjust the lower or upper
limit. Easiest way to do that is to assume we are somewhere in the
middle of a line, and use the following line (hence the use of line *2*
of the remaining chunk). However, this means that we need to be sure the
new 1/2-block contains a CR - hence the need for kBrute.
[Aside - I said "assume we are in the middle of a line" - even if we are
actually at the start or end of it, it sill works OK to use the
following line, so it's not a critical assumption]
So - I'll go check the code again (and probably just handle this corner
case to eliminate kBrute - that should make it marginally faster as well).
-- Alex.
On 02/02/2013 04:13, J. Landman Gay wrote:
I'm glad you did this, it's interesting. I'm not sure about the binary
results. Doesn't the repeat loop exit early if the search block
contains a few short lines? I think you could get a block of text that
might contain the first instance of a leading digit, but they wouldn't
be checked because their total length would be less than kBrute.
Anyway, for lists with shorter lines the script would need to lower
the kBrute variable to a smaller number, and then the timing might be
different. 10,000 seems like a pretty large exit condition.
I could easily not be understanding what's going on though. These
things warp my brain.
On 2/1/13 6:41 PM, Alex Tweedly wrote:
I just can't resist a good benchmark / coding challenge.
...
function f3 @pD
local r, tLower, tUpper, tMid, c, t, temp
constant kBrute = 10000
put 1 into tLower
put (the number of chars in pD) + 1 into tUpper
repeat 10000 times -- more than enough :-)
if tUpper - tLower < kBrute then exit repeat
put (tUpper + tlower ) / 2 into tMid
put char 1 of line 2 of (char tMid to tUpper of pD) into c
if c is a number then
put tMid into tUpper
else
put tMid into tLower
end if
end repeat
put char tLower to tUpper of pD into temp
filter temp without "[0-9]*"
if temp is empty then
-- no lines start with numeric
return "No lines"
else
put the number of lines in (char 1 to tLower of pD) into t
add the number of lines in temp to t
return t
end if
end f3
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode