There may well be a bug in my code - I wrote it between midnight and 1am last night, tested it on 50 random cases, and let it be free :-) I'll look over it carefully this morning and see if I find anything.

But no, there isn't a problem with many short lines. We have a block of data where the first instance of a leading digit (if there is one) occurs; that block is (inclusively) between tLower and tUpper. We don't care how many lines there are in that block -only that the first leading digit is within it. There's no need to decrease kBrute to deal with short lines, because we still do a full search (in this case using filter) when we get down to a block of size kBrute.

It would actually be more efficient to decrease kBrute, because then we'd do the efficient binary reduction further, until we switch to the slower - but easier - brute force method. The purpose of kBrute is to avoid a couple of corner cases that I didn't want to think through :-) When we calculate a new tMid, we then need to find the start of a line near to it, so that we can decide whether to adjust the lower or upper limit. Easiest way to do that is to assume we are somewhere in the middle of a line, and use the following line (hence the use of line *2* of the remaining chunk). However, this means that we need to be sure the new 1/2-block contains a CR - hence the need for kBrute.

[Aside - I said "assume we are in the middle of a line" - even if we are actually at the start or end of it, it sill works OK to use the following line, so it's not a critical assumption]

So - I'll go check the code again (and probably just handle this corner case to eliminate kBrute - that should make it marginally faster as well).

-- Alex.

On 02/02/2013 04:13, J. Landman Gay wrote:
I'm glad you did this, it's interesting. I'm not sure about the binary results. Doesn't the repeat loop exit early if the search block contains a few short lines? I think you could get a block of text that might contain the first instance of a leading digit, but they wouldn't be checked because their total length would be less than kBrute.

Anyway, for lists with shorter lines the script would need to lower the kBrute variable to a smaller number, and then the timing might be different. 10,000 seems like a pretty large exit condition.

I could easily not be understanding what's going on though. These things warp my brain.


On 2/1/13 6:41 PM, Alex Tweedly wrote:

I just can't resist a good benchmark / coding challenge.

...

function f3 @pD
   local r, tLower, tUpper, tMid, c, t, temp
   constant kBrute = 10000

   put 1 into tLower
   put (the number of chars in pD) + 1 into tUpper

   repeat 10000 times -- more than enough :-)
      if tUpper - tLower < kBrute then exit repeat
      put (tUpper + tlower ) / 2 into tMid
      put char 1 of line 2 of (char tMid to tUpper of pD) into c
      if c is a number then
         put tMid into tUpper
      else
         put tMid into tLower
      end if
   end repeat

   put char tLower to tUpper of pD into temp
   filter temp without "[0-9]*"
   if temp is empty then
      -- no lines start with numeric
      return "No lines"
   else
      put the number of lines in (char  1 to tLower of pD) into t
      add the number of lines in temp to t
      return t
   end if
end f3



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to