A few things: 1. It seems codepointOffset can only find a single character? So it won't work for any search for a multi-character string? 2: codepointOffset seems to work differently for multi-byte characters and regular characters:
put codepointoffset("e","↘ndatestest",6) -- puts 3 put codepointoffset("e","andatestest",6) -- puts 9 3: It seems that when multi-byte characters are involved, codepointOffset suffers from the same sort of slow-down as offset does. For example, in a 145K string with about 20K hits for a single character, a simple codepointOffset routine (below) takes over 10 seconds, while the item-based routine takes about 0.3 seconds for the same results. On Mon, Nov 12, 2018 at 4:21 PM Monte Goulding via use-livecode < use-livecode@lists.runrev.com> wrote: > Hi Folks > > I was a bit perplexed by this so I had a quick look about the engine and I > see the issue. The problem is you are using `offset` which works on > characters. Characters in LiveCode are neither unicode codepoints or bytes. > They are graphemes. This means that when you have chars to skip the entire > string needs to be parsed to find the grapheme boundaries so that the index > can be translated into graphemes to skip. Note that if the strings you were > dealing with weren’t unicode then the translation of chars to graphemes is > 1 -> 1 so there’s no big cost which is why things are much faster when you > textEncode and offset that. > > So! Change to using codepointOffset and hopefully it will be much speedier! > > Cheers > > Monte > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode