Thank Dick

I will try your idea - it may need a small mod, see below. I will probably post 
some timings later if people are still interested in this thread.

Mark’s method of converting to an array of lines to gain random access to the 
lines has given me at least a 50-fold increase in speed in many of my handlers 
(perhaps more - my analysis may have been somewhat flawed but I still have the 
gut feeling LineOffset -and finding line k of text- is at 50 to 100 times 
slower on unicode than on ascii). So this is the method I have implemented; 
re-factoring took a few hours but was well worth it] 

However there is a drawback in using arrays rather than the original text - I 
often need to search the text, to find the first (or next) line containing a 
string, and there is no built-in arrayOffset. Binary (logarithmic) search is an 
extremely fast search algorithm for searching the elements of an array IF the 
array is pre-sorted appropriately for the search item beforehand, but that 
doesn’t suit my use-case at all. Sorting the keys of an array according to the 
contents of elements is  another story (combine by return, sort, split by 
return? Split is  OK as a once-off , but it gets expensive if it needs to be 
done multiple times - splitting 1700 lines of sample text took about 0.1 
seconds).

There is however

filter elements of <array> with <str> into tLines; put line 1 of the keys of 
tLines into tFoundLine

And that is still very fast on unicode, even though it will find all the lines 
matching str, not just the first. This can be an advantage or not. [Note <str> 
does need to be set up to match a whole line]

I have a LineSearch algorithm alternative for lineOffset which searches for the 
first occurrence of a targetString in unicode text which avoids finding line 
endings and which is faster than using the filter method on arrays 
(particularly if the overhead of converting the text to an array with split by 
return is added). It uses the fact that matchChunk is still exceedingly fast on 
unicode (did someone complain that matchChunk appeared to be slow on unicode?? 
Who was that foolish boy? Wasn’t me Sir!). Slight drawback is you have to 
escape all the special regex characters in the target string before using the 
regex 

"(?m)(?i)^(.*?" & pTargetStr & ".*?)$”

but replace is very fast, so this is messy but not a big deal. Big drawback is 
it gives the text of the line but not the line number, and so the algorithm is 
not adaptable to skipping lines; your version may answer that but looks like it 
needs the repeat loop which looks expensive.  Hmm, wait, aren’t you looping on 
“line tLine in tLinesToSearch” which implicitly involves finding the line 
delimiters in the unicode text, which is back to the original problem?

Neville

> On 3 Jul 2024, at 2:00 am, use-livecode-requ...@lists.runrev.com wrote:
> 
> Send use-livecode mailing list submissions to
>       use-livecode@lists.runrev.com
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       http://lists.runrev.com/mailman/listinfo/use-livecode
> or, via email, send a message with subject or body 'help' to
>       use-livecode-requ...@lists.runrev.com
> 
> You can reach the person managing the list at
>       use-livecode-ow...@lists.runrev.com
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of use-livecode digest..."
> 
> 
> you can find the archives for this list at:
> 
> http://lists.runrev.com/pipermail/use-livecode/
> 
> and search them using this link:
> 
> https://www.mail-archive.com/use-livecode@lists.runrev.com/
> 
> 
> Today's Topics:
> 
>   1. Re: Slow stack problem (Dick Kriesel)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 2 Jul 2024 01:31:13 -0700
> From: Dick Kriesel <dick.krie...@mail.com>
> To: How to use LiveCode <use-livecode@lists.runrev.com>
> Subject: Re: Slow stack problem
> Message-ID: <07280e11-c947-41af-9e81-066845475...@mail.com>
> Content-Type: text/plain;     charset=utf-8
> 
> 
> 
>> On Jun 28, 2024, at 3:15?AM, Neville Smythe via use-livecode 
>> <use-livecode@lists.runrev.com> wrote:
>> 
>> I have a solution or at least a workaround
> 
> Hi, Neville. You may find a worthwhile improvement in speed if you avoid 
> referring to the Unicode lines by their line numbers (as in "line k of fff").
> 
> Here's a way:
> 
> function findLineNumbersInUnicode pLinesToFind, tLinesToSearch -- returns a 
> comma-delimited list of the line numbers of lines that contain any of the 
> lines to find
> 
>  local tRegExp, tLineNumber, tLineNumbers
> 
>  repeat for each line tLineToFind in pLinesToFind
> 
>    put "(^[0-9-]*\t" & tLineToFind & ")" into tRegExp
> 
>    put 0 into tLineNumber
> 
>    repeat for each line tLine in tLinesToSearch
> 
>      add 1 to tLineNumber
> 
>      if matchChunk(tLine, tRegExp) then
> 
>        put tLineNumber & comma after tLineNumbers
> 
>      end if
> 
>    end repeat
> 
>  end repeat
> 
>  return char 1 to -2 of tLineNumbers
> 
> end findLineNumbersInUnicode
> 
> 
> 
> If you try the idea, please share your test results.
> 
> ? Dick Kriesel
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> ------------------------------
> 
> End of use-livecode Digest, Vol 250, Issue 1
> ********************************************

Neville Smythe
neville.smy...@optusnet.com.au
0414517719




_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to