Malte, As I said, I'm discovering these things as I go--I hadn't even heard of LC until last month. I'm finding that work with Unicode in LC involves a lot of jumping through hoops, but so far I have been able to do everything I needed. So don't give up :)
I am not sure why your stack doesn't "know" whether the text in your field is UTF-16 or plain ANSI, but here is what I do: When I read some text from a file into a variable, I assume that it is UTF-8. There is no harm in that. Even if it turns out to be plain English, it can still be treated that way. When I assign that text to a field, I always use set the unicodeText of field MyField to uniEncode(myVar, "UTF8") Now the text in the field is UTF-16. I check to see if the first two bytes are decimal 255 followed by decimal 254 (or the reverse, 254 followed by 255), and if they are, I delete them, because that's BOM. I can read and edit the field using the system's multilanguage input, like the Russian keyboard in Windows. Russian and English can be typed in any combination, but it is still all UTF-16. Each letter and each punctuation mark is a two-byte sequence. If you call length(the unicodeText of field MyField) it will report twice the number of characters that you see in the field. So if I have to access character N in the field, I do this: set useUnicode to true put char N to char N+1 of field MyField into myChar answer charToNum(myChar) That will show you a decimal number, like 1072 if myChar is a lower case Cyrillic a or an ASCII number if it is an English letter. Even plain English letters must be accessed like that, as two bytes. For English, the first byte is a null, and the second is the ASCII of the letter, but you don't need to think of that. Just treat every letter as a two-char sequence. If the user types in that field, what he types is in UTF-16. If I need to do anything with the text in the field, like store it to a file, I read it into a variable: put the unicodeText of field MyField into myVar2 and immediately convert it to UTF-8: put uniDecode(myVar2, "UTF16") into myVar2 Now myVar2 is UTF-8 and can be stored in a file or processed by scripts. There are apparently limitations to what you can do with Cyrillic in LC, but the things that I have listed all work for me. Slava > -----Original Message----- > From: use-livecode-boun...@lists.runrev.com [mailto:use-livecode- > boun...@lists.runrev.com] On Behalf Of Malte Brill > Sent: Wednesday, June 01, 2011 9:23 AM > To: use-livecode@lists.runrev.com > Subject: Re: Re: Cyrillic input > > Thanks mark and Slava! > > well, this is getting me a bit further. Now if only I knew if I could reliably check if > the text in my field regular ASCII or UTF encoded, that would really make my > day. > > Cheers, > > malte > _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode