Re: How to find offsets in Unicode Text fast

Niggemann, Bernd via use-livecode Mon, 12 Nov 2018 12:10:25 -0800

Ben,

Please see my remarks out failing UTF-32 with some Icelandic characters. 
Currently I would not recommend offset(UTF-32 text) unless one knows which 
character set is suited to be used and is in control of that character set. The 
same goes for UTF-16.


I also thought that byteOffset would be faster for case-sensitive search in 
UTF-32 text. It turned out to be slower than offset(UTF-32 text).

>Ben Rubinstein via 
>use-livecode<https://www.mail-archive.com/search?l=use-livecode@lists.runrev.com&q=from:%22Ben+Rubinstein+via+use%5C-livecode%22>
> Mon, 12 Nov 2018 11:38:26 
>-0800<https://www.mail-archive.com/search?l=use-livecode@lists.runrev.com&q=date:20181112>

>Coming late to this discussion. Very excited by this approach of converting 
>everything to UTF-32 in order to do fast offsets.

>In the meantime I'd be suspicious about doing a case-insensitive search in 
>this way; but my guess would be that, if your use-case will accept 
>case->sensitivity, it would be safer (and faster?) to use byteOffset on the 
>UTF-32 data rather than offset.

Kind regards
Bernd
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to find offsets in Unicode Text fast

Reply via email to