So then why does put textEncode("a","UTF-32") into X;put chartonum(byte 1 of X) put 97? That implies that "byte" 1 is "a", not 1100001. Likewise, put textEncode("㍁","UTF-32") into X;put chartonum(byte 1 of X) puts 65.
I've looked in the dictionary and I don't see anything that comes close to describing this. gc On Mon, Nov 12, 2018 at 10:21 PM Mark Waddingham via use-livecode < use-livecode@lists.runrev.com> wrote: > On 2018-11-13 07:15, Geoff Canyon via use-livecode wrote: > > On Mon, Nov 12, 2018 at 3:50 PM Monte Goulding via use-livecode < > > use-livecode@lists.runrev.com> wrote: > > Unless I'm misunderstanding, this hasn't been my observation. Using > > offset > > on a string that has been textEncodet()ed to UTF-32 returns values that > > are > > 4 * (the character offset - 1) + 1 -- if it were re-encoded, wouldn't > > it > > return the actual offsets (except when it fails)? Also, 𐀁 encodes to > > 00010001, and routines that convert to UTF-32 and then use offset will > > find > > five instances of that character in the UTF-32 encoding because of > > improper > > boundaries. To see this, run this code: > > > > on mouseUp > > put textencode("𐀁","UTF-32") into X > > put textencode("𐀁𐀁𐀁","UTF-32") into Y > > put offset(X,Y,1) > > end mouseUp > > > > That will return 2, meaning that it found the encoding for X starting > > at > > character 2 + 1 = 3 of Y. In other words, it found X using the last > > half of > > the first "𐀁" and the first half of the second "𐀁" > > The textEncode function generates binary data which is composed of > bytes. When you use binary data in a text function (which offset is), > the engine uses a compatability conversion which treats the sequence of > bytes as a sequence of native characters (this preserves what happened > pre-7.0 when strings were only ever native, and as such binary and > string were essentially the same thing). > > So if you textEncode a 1 (native) character string as UTF-32, you will > get a four byte string, which will then turn back into a 4 (native) > character string when passed to offset. > > Warmest Regards, > > Mark. > > -- > Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/ > LiveCode: Everyone can create apps > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode