Re: What is LC's internal text format?

Mark Waddingham via use-livecode Mon, 12 Nov 2018 22:23:48 -0800

On 2018-11-13 07:15, Geoff Canyon via use-livecode wrote:

On Mon, Nov 12, 2018 at 3:50 PM Monte Goulding via use-livecode <
use-livecode@lists.runrev.com> wrote:
Unless I'm misunderstanding, this hasn't been my observation. Usingoffseton a string that has been textEncodet()ed to UTF-32 returns values thatare4 * (the character offset - 1) + 1 -- if it were re-encoded, wouldn'tit
return the actual offsets (except when it fails)? Also, 𐀁 encodes to
00010001, and routines that convert to UTF-32 and then use offset willfindfive instances of that character in the UTF-32 encoding because ofimproper
boundaries. To see this, run this code:
on mouseUp
   put textencode("𐀁","UTF-32") into X
   put textencode("𐀁𐀁𐀁","UTF-32") into Y
   put offset(X,Y,1)
end mouseUp
That will return 2, meaning that it found the encoding for X startingatcharacter 2 + 1 = 3 of Y. In other words, it found X using the lasthalf of
the first "𐀁" and the first half of the second "𐀁"

The textEncode function generates binary data which is composed ofbytes. When you use binary data in a text function (which offset is),the engine uses a compatability conversion which treats the sequence ofbytes as a sequence of native characters (this preserves what happenedpre-7.0 when strings were only ever native, and as such binary andstring were essentially the same thing).

So if you textEncode a 1 (native) character string as UTF-32, you willget a four byte string, which will then turn back into a 4 (native)character string when passed to offset.


Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: What is LC's internal text format?

Reply via email to