Re: double byte chars?

Pierre Sahores Sat, 11 Jun 2011 17:07:18 -0700

Thanks for this useful synthesis, Slava.

Best regards,


Pierre

Le 11 juin 2011 à 21:12, Slava Paperno a écrit :

> The "set useUnicode to true" command is necessary only if you use the 
> charToNum() or numToChar() functions. Otherwise they’re not useful.
> 
> The text in your fields is in UTF-16, and you should access it as unicodeText 
> of field "MyField."
> 
> Word chunks of unicodeText can be correctly retrieved if you use:
> 
> word 2 of unicodeText of field "MyField"
> 
> There is a tutorial by Devin Asay on the use of Unicode in LiveCode at 
> http://www.runrev.com/developers/lessons-and-tutorials/tutorials/unicode-in-revolution/
>  It has examples of retrieving a specific word chunk.
> 
> If you start processing Russian text in your variables, you will often find 
> it better to convert it to UTF8 first: put uniDecode(unicodeText of field 
> "MyField", "UTF") into MyUTF8String. To put the result back into a field, 
> convert it back to UTF-16: set unicodeText of field "MyField" to 
> uniEncode(MyUTF8String, "UTF8")
> 
> A sure-fire way to do any sort of string comparisons is to convert everything 
> to decimal code points and then work with the numbers. Some parts of LC is 
> not capable of shipping Unicode strings, and in those situations using the  
> numbers solves the problem.
> 
> If you are reading your UTF-8 text from Unicode text files (e.g. saved from 
> Notepad with the UTF-8 encoding option), you may have to take into account 
> the first three bytes that you read in: they are the Byte Order Marker. 
> You'll want to delete them from your strings before trying to access a 
> specific byte in the string. 
> 
> If you still get into trouble, feel free to ask me offlist (s...@cornell.edu) 
> for a sample application that shows these operations. I'm still working on 
> it, but when I'm done, I'll make it available online.
> 
> Best regards,
> 
> Slava
> 
> 
>> -----Original Message-----
>> From: use-livecode-boun...@lists.runrev.com [mailto:use-livecode-
>> boun...@lists.runrev.com] On Behalf Of Richmond Mathewson
>> Sent: Saturday, June 11, 2011 2:24 PM
>> To: How to use LiveCode
>> Subject: Re: double byte chars?
>> 
>> On 06/11/2011 09:14 PM, Lars Brehmer wrote:
>>> My project has Russian text fields (Arial,Russian). With one
>> exception, everything works fine.
>>> 
>>> Problem: a filter-as-you-type script.
>>> 
>>> field "t1":     зо
>>> field "t2":     меня зовут Виктор  --underscoring shows the matches--
>>> field "t3":     зовут курить почему
>>> 
>>> I want to do is find a word in fields t2 and t3 that begins with the
>> 2 letters in field t1. Word 2 in field t2 and word 1 in field t3 should
>> be matches. But this only works if the matching word is the first word
>> in the field!
>>> 
>>> Some simple message box scripts:
>> 
>> At the risk of insulting you, as you are using Unicode I have a funny
>> feeling you have to
>> prefix this sort of this with
>> 
>> set the useUnicode to true
>>> put fld "t1"&  cr&  fld "t2"&  cr&  fld "t3"
>>> 
>>> The result is a bunch of numbers, symbols and squares. You can
>> clearly spot the matches.
>>> 
>>> Next in the message box:   --char 1 to 4 -- double byte chars--
>>> 
>>> put char 1 to 4 in fld "t1" into aText
>>> put char 1 to 4 in word 2 in fld "t2" into bText
>>> put char 1 to 4 in word 1 in fld "t3" into cText
>>> put aText&  cr&  bText&  cr&  cText
>>> 
>>> This should be 3 identical lines, right? But no. Line 2 is missing
>> the final char.
>>> 
>>> 7(square)>(square)
>>> 7(square)>
>>> 7(square)>(square)
>>> 
>>> Next: comparing the strings
>>> 
>>> if cText = aText then beep - it beeps
>>> if cText is in aText then beep - it beeps
>>> if bText = aText then beep - no beep, obviously
>>> 
>>> BUT
>>> 
>>> if bText is in aText then beep - also no beep!
>>> 
>>> And then
>>> 
>>> put char 1 to 5 in word 2 in field "t2", it returns the same as the
>> other two:
>>> 
>>> 7(square)>(square)
>>> 
>>> so then
>>> 
>>> put char 1 to 5 in word 2 into bText
>>> 
>>> but
>>> 
>>> if bText = (or is in) aText still returns nothing
>>> 
>>> Why is that last double byte char always missing when the word is not
>> word 1 in its field? If I do char 1 to 3 I get this (again!)
>>> 
>>> 7(square)>
>>> 7(square)   --last char missing!
>>> 7(square)>
>>> 
>>> Using itemDEL = space and char 1 to x in item z behaves the same.
>>> 
>>> Anyone know the answer?
>>> 
>>> Cheers,
>>> 
>>> Lars
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode@lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

--
Pierre Sahores
mobile : (33) 6 03 95 77 70

www.woooooooords.com
www.sahores-conseil.com





_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: double byte chars?

Reply via email to