Brilliant! Thanks, Ron. Very educational for me. By the way, Malt--there was a bug in my post from half an hour ago. To convert from the UTF16 in the field back to UTF-8 for storing or manipulating, use this:
uniDecode(myVar, "UTF8") I had "UTF16" there, and Dave C. pointed it out in a separate (related) thread--which saved my day. Slava > -----Original Message----- > From: use-livecode-boun...@lists.runrev.com [mailto:use-livecode- > boun...@lists.runrev.com] On Behalf Of ron barber > Sent: Wednesday, June 01, 2011 10:11 AM > To: How to use LiveCode > Subject: Re: Cyrillic input > > Hi malte, > This is a modified function that Ken, Richard (and maybe Jacque) had a hand in > some time ago. > It does essentially the same thing that Slava suggested but I offer it as it has > helped me. > > Thanks > Ron > > function RawDataToUTF16 pData > -- Examine the data to determine encoding: > -- UTF8 has 0xEF 0xBB 0xBF > -- UTF16BE has 0xFE 0xFF > -- UTF16LE has 0xFF 0xFE > > switch > case charToNum(byte 1 of pData) = 0 > put "UTF16BE" into tTextEncoding > break > case charToNum(byte 1 of pData) = 0xFE and charToNum(char 2 of > pData) = 0xFF > delete byte 1 to 2 of pData > put "UTF16BE" into tTextEncoding > break > case charToNum(byte 1 of pData) = 0xFF and charToNum(char 2 of > pData) = 0xFE > delete byte 1 to 2 of pData > put "UTF16LE" into tTextEncoding > break > case char 1 to 3 of pData is "Ôªø" > put "utf8" into ttextencoding > break > > default > put "UTF8" into tTextEncoding > break > end switch > -- > if tTextEncoding begins with "UTF16" then > -- Check byte order, swapping if needed: > if the processor is "x86" then > put "LE" into tHostByteOrder > else > put "BE" into tHostByteOrder > end if > if byte -2 to -1 of tTextEncoding <> tHostByteOrder then > put swapbytes(pData) into pData > end if > -- Already utf16, so nothing more needs to be done: > #put uniEncode(uniDecode(pData, utf16),16) into tFieldData > put pData into tFieldData > else > put uniEncode(pData, "utf8") into tFieldData > end if > -- Convert from utf8 to Rev's native utf16: > replace uniencode("Åv","Japanese") with "**" in tFieldData > replace CRLF with cr in tFieldData > replace numtochar(13) with cr in tfieldData --affects japanese ? > replace "**" with uniencode("Åv","Japanese") in tFieldData > return tFieldData > end RawDataToUTF16 > > > On Wed, Jun 1, 2011 at 10:56 PM, Slava wrote: > > Malte, > > > > As I said, I'm discovering these things as I go--I hadn't even heard > > of LC until last month. I'm finding that work with Unicode in LC > > involves a lot of jumping through hoops, but so far I have been able > > to do everything I needed. So don't give up :) > > > > I am not sure why your stack doesn't "know" whether the text in your > > field is UTF-16 or plain ANSI, but here is what I do: > > > > When I read some text from a file into a variable, I assume that it is > > UTF-8. There is no harm in that. Even if it turns out to be plain > > English, it can still be treated that way. > > > > When I assign that text to a field, I always use > > > > set the unicodeText of field MyField to uniEncode(myVar, "UTF8") > > > > Now the text in the field is UTF-16. I check to see if the first two > > bytes are decimal 255 followed by decimal 254 (or the reverse, 254 > > followed by 255), and if they are, I delete them, because that's BOM. > > > > I can read and edit the field using the system's multilanguage input, > > like the Russian keyboard in Windows. Russian and English can be typed > > in any combination, but it is still all UTF-16. Each letter and each > > punctuation mark is a two-byte sequence. If you call length(the > > unicodeText of field > > MyField) it will report twice the number of characters that you see in > > the field. > > > > So if I have to access character N in the field, I do this: > > > > set useUnicode to true > > put char N to char N+1 of field MyField into myChar answer > > charToNum(myChar) That will show you a decimal number, like 1072 if > > myChar is a lower case Cyrillic a or an ASCII number if it is an > > English letter. > > > > Even plain English letters must be accessed like that, as two bytes. > > For English, the first byte is a null, and the second is the ASCII of > > the letter, but you don't need to think of that. Just treat every > > letter as a two-char sequence. > > > > If the user types in that field, what he types is in UTF-16. > > > > If I need to do anything with the text in the field, like store it to > > a file, I read it into a variable: > > > > put the unicodeText of field MyField into myVar2 > > > > and immediately convert it to UTF-8: > > > > put uniDecode(myVar2, "UTF16") into myVar2==> CORECTION: should be uniDecode(myVar2, "UTF8") > > > > Now myVar2 is UTF-8 and can be stored in a file or processed by scripts. > > > > There are apparently limitations to what you can do with Cyrillic in > > LC, but the things that I have listed all work for me. > > > > Slava > > > >> -----Original Message----- > >> From: use-livecode-boun...@lists.runrev.com [mailto:use-livecode- > >> boun...@lists.runrev.com] On Behalf Of Malte Brill > >> Sent: Wednesday, June 01, 2011 9:23 AM > >> To: use-livecode@lists.runrev.com > >> Subject: Re: Re: Cyrillic input > >> > >> Thanks mark and Slava! > >> > >> well, this is getting me a bit further. Now if only I knew if I could > > reliably check if > >> the text in my field regular ASCII or UTF encoded, that would really > >> make > > my > >> day. > >> > >> Cheers, > >> > >> malte > >> > > > > > > > > _______________________________________________ > > use-livecode mailing list > > use-livecode@lists.runrev.com > > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > > http://lists.runrev.com/mailman/listinfo/use-livecode > > > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode