Re: to find the first numeric line?

Alex Tweedly Sun, 03 Feb 2013 12:30:16 -0800

On 02/02/2013 18:56, J. Landman Gay wrote:

On 2/2/13 3:08 AM, Alex Tweedly wrote:

There's no need to decrease kBrute to deal with
short lines, because we still do a full search (in this case using
filter) when we get down to a block of size kBrute.


Aha. The filter is the part I missed. I get it now. Clever.

Hmmm - kinda "clever", as in "cute clever".

But also kinda dumb, as in "why abandon an efficient method for aninefficient one, just to save a little bit of thinking" :-)

So I went back, did the thinking, eliminated kBrute (and thankfully theneed for an unnecessary assumption/requirement to limit line length). Ialso changed it so that when we are about to test, we always have acomplete line identified - and so it is much easier to adapt the codefor other purposes.


And - I changed the problem definition from
    "Return the line number of the first line with a leading numeric"
to

"return the character number of the first character of the firstline with a leading numeric"

This is much more efficient to use outside the function, as well as moreefficient inside the function. The caller can now do something like


    put getCharacterPosition(myData) into tPos
    repeat for each line L in (char tPos to -1 of myData)
       ...

instead of
   put getLinePosition(myData) into tLPos
   repeat for each line L in (line tLPos to -1 of myData)
       ...

or
   put line 1 of (char tPos to -1 of myData) into temp
rather than
  put line tLPos of Mydata into temp

Both of these are much more efficient (will save 10s or 100s ofmillisecs in large data sets).

With these changes, the binary search can now do 10M lines of input intypically <1 msecEven the worst case I can think of (all 300 Million characters are in asingle line, no CRs anywhere), it gets a result in 1800 msec.

So - here's the code for anyone who needs an efficient binary search oflarge string data.

-- Alex.

function f3 @pD
   local theResult
   local tLower, tUpper, tMid, c, t

   -- NB we are always going to maintain theResult as a feasible answer

-- if there is no 'leading numeric' line, we answer with a numberbeyond the input data

   put (the number of chars in pD) + 1 into theResult

   put 1 into tLower
   put theResult into tUpper
   put trunc( (tUpper + tlower ) / 2) into tMid

   repeat 10000 times
      -- check for a CR in the upper half
      put the offset(CR, char tMid to tUpper of pD) into t
      if t = 0 or tMid + t = tUpper then
         -- there is no CR in upper half
         -- move tUpper down to near tMid.

-- Can't just move it down to exactly tMid because of thecorner case

         -- where tMid to tUpper exactly spans the last line
         put tMid+1 into tUpper
      else
         -- there is a CR in this upper half, move tMid to just beyond it
         add t to tMid
         -- now tMid points to the start of a line
         -- do the test here - easy to adapt
         put char tMid of pD into c
         if c is a number then
            put tMid into tUpper
            put tMid into theResult
         else   --
            put tMid into tLower
         end if
      end if

      put trunc( (tUpper + tLower ) / 2) into tMid
      if tMid = tLower then

-- we're done, just decide if this single remaining line is acandidate for the answer

         if char tMid of pD is a number then  -- do the test again !!
            put tMid into theResult
         end if
         exit repeat
      end if

      if tMid = tLower+1 then

-- tLower must point to the first char of an empty lie, so wecan increment over it

         put tMid into tLower
      end if
   end repeat

   return theResult

end f3



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: to find the first numeric line?

Reply via email to