Re: Slow stack problem

Mark Waddingham via use-livecode Fri, 28 Jun 2024 05:03:19 -0700

On 2024-06-28 11:15, Neville Smythe via use-livecode wrote:

In my last epistle I mentioned the repeat loop had only 32 iterations.Much more relevant of course is the inner loop on the number of linesof the data variable fff. In this case fff had 1760 lines. So the totalpossible number of iterations was around 30000 to 50000, getting upthere but still well within LC capability.
I tried operating the loop on just the first 100 lines of fff. Therepeats took 0.006 seconds (impressive).
Then just the first 300 lines. Timing was 0.016 seconds, approximatelylinear increase as expected


Are you sure a linear increase is expected?

Then the first 500 lines. Timing is now 6.43 seconds. Something veryodd there, that’s beyond exponential increase.
And on the last 500 lines, timing was 0.135 seconds (Aha !!!)

I take it you tested this by doing the loop where fff was just line 1 to300, then just line 1 to 500 and then just -500 to -1?

My guess here is that the first 300 lines do not have a unicodecharacter, there is somewhere in the next 200, and there are none in thelast 500.

This would seem to point to matchChunk having indigestion oversomething in the middle of the text data. The data is 98% plain ascii,but quite likely it has a few Unicode characters: I have taken a wholelot of time to convert my legacy app to accept Asian and EuropeanUnicode personal names.

All regex matches in LC are Unicode (under the hood) - so thinking thisis regex related is a red-herring. Running a regex on Unicode text, isno different from on native text - its just its dealing with 16-bitunits rather than 8-bit (regex is generally a linear operation - ittakes time proportional, roughly to length of pattern * length of string- well, as long as you don't use any backtracking type features).

The issue here is the assumption that your code is doing somethinglinear...

It *looks* linear because your code is only doing two nested repeatloops - so from the point of view of lines of script its iterating thecentral loop roughly 'the number of lines of indexList * the number oflines of fff' - however the engine is doing a lot more work.


The central loop is doing (paraphrased);

   repeat with x = 1 to N
     get line x of jjj
   end repeat

This means the engine is searching for a line delimiter not N times, butsum(1, ..., N) times (which is N*(N-1)/2 - i.e. N^2 roughly).

Processing 300 lines will take the engine about 45000 steps, processing500 lines will take the engine 125000 steps, processing 1760 lines willtake the engine > 1,500,000.

This means that a small change in how long it takes to search for a linedelimiter (which is the fundamental operation here) makes a bigdifference to how long doing 1000's of them will take - and searching astring containing unicode is a fair bit slower than searching a stringwhich contains only native characters.

Fortunately there is an easy solution here - use an array so you getrandom access to the lines of fff:


   split fff by return
   repeat with i=2 to the number of lines of indexList
      put line (i+1) of indexList into which

      put "(^[0-9-]*\t" & which & ")" into regX

      put false into found
      repeat with k=lastFound+1 to the number of elements of fff
         put matchChunk(fff[k],regX,pos1,pos2) into found

         if found then exit repeat
      end repeat

      if found then
         put k into lastFound
      end if

      put lastFound into item i of mylineNumbers
   end repeat

This should do exactly the same thing but SUBSTANTIALLY faster whetheryour fff variable contains native or unicode characters.


Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Slow stack problem

Reply via email to