Re: How to find offsets in Unicode Text fast

Niggemann, Bernd via use-livecode Mon, 12 Nov 2018 09:59:24 -0800

Thank you Brian for putting the test stack up. It makes it easier to test 
various non-ASCII texts.


As your testing shows the UTF16 variant can be misleading.

Unfortunately I also found a case of UTF32 not working.

I copied from Icelandic Wikipedia from the entry about the capital Reykjavik 
some text as source (haystack) and put the Icelandic word for Reykjavik 
(Reykjavík) into the delimiter(needle).

Using UTF16 works but alas UTF32 does not find anything.

So now it seems that my attempt to fool the offset function into greater speed 
by using either UTF16 or UTF32 textEncoded versions of "needle" and "haystack" 
is not reliable.

Probably there is an explanation for this which eludes me.

Sorry to have to retract my proposition for being unreliable. Would have loved 
to use the speed gain for "offset" which is horribly slow for non-ASCII text.

Kind regards
Bernd



Am 12.11.2018 um 12:00 schrieb 
use-livecode-requ...@lists.runrev.com<mailto:use-livecode-requ...@lists.runrev.com>:

From: Brian Milby
To: How to use LiveCode 
<use-livecode@lists.runrev.com<mailto:use-livecode@lists.runrev.com>>
Subject: Re: How to find offsets in Unicode Text fast


I just tried one additional test.  Search for "åå" within "aaååÅÅååaa".
(On a Mac keyboard, the characters are made with A, Option-A, and
Shift-Option-A.)  The Offset UTF16 version does not return the correct
result if case sensitive is false (returns the same value as if it were
true: 3,7).  Every other version correctly performs the case folding
(3,4,5,6,7).

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to find offsets in Unicode Text fast

Reply via email to