Re: Livecode and UTF8

Peter W A Wood Wed, 29 Jul 2015 17:53:52 -0700

Simon

I did not reply sooner as I’m not such an experienced LiveCoder. Also, I am 
concentrating on learning LiveCode versions from 7 onwards. Handling Unicode is 
much, much easier in those versions.


In the code you posted here, you appear to have used uniDecode where you needed 
to use uniEncode.

I ran this in the message box in LiveCode 8:

put numToChar(226) into tString -- E2 hex
put numToChar(128) after tString -- 80 hex
put numToChar(156) after tString -- 9C hex
put  unicode uniEncode(tString, "UTF8”)

It successfully displayed this:

“

I am not familiar with using Unicode in the older versions but noticed that you 
typed “useUniCode” instead of “useUnicode” in the code example in your message.

I hope this helps to get you started.

Peter


> On 29 Jul 2015, at 02:29, Simon Knight <si...@smknight.co.uk> wrote:
> 
> Hi,
> 
> I have an app that works with old fashioned text i.e. characters with a code 
> value of less than 128.  Recently I enabled cut and paste and the app gets 
> confused it text is pasted in with character values > 127.  I have done the 
> obvious and an filtering out all characters with an Ascii value of >127 but 
> in the longer term I want to convert a few high bit characters to a low bit 
> versions e.g. smart quotes to dumb quotes.
> 
> Some of the text that gets pasted from my email client is in UTF8. I have 
> done some web research and now know a little about UTF8.  I have written a 
> routine that captures any UTF8 code patterns and passes the UTF string to a 
> routine for conversion.
> 
> A UTF8 string may be between one and four bytes long and every byte has a 
> value greater than 127.  I wish to extract the UTF character value and use 
> the value to do the conversion.  My question is: does livecode have any 
> method of converting  a UTF8 character string to either a UTF16 string or to 
> the numeric value of the character which I believe is the same if leading 
> zeros are ignored?  For instance a smart open quote appears in my data as a 
> series of three bytes : [E2-hex,80-hex,9C-hex] the numeric value of the 
> character is encoded within the bits of the three bytes and will take some 
> bit shifting to extract : the UTF8 string decodes to 201C-hex or 8220 base 10.
> 
> At present I am working with Livecode 6.7 and have read about and tried the 
> uniEncode and uniDecode functions.  The description of these functions does 
> not make any sense to me as they seem to be about adding or removing every 
> other byte which can't work with UTF8.
> 
> I have tried various versions of the following button code attempting to get 
> a result of 8220 base 10:
> 
> on mouseUp
>     put numToChar(226) into tString -- E2 hex
>     put numToChar(128) after tString -- 80 hex
>     put numToChar(156) after tString -- 9C hex
> 
>     Set the UseUniCode to true
> 
>     put "source string :" &  tstring
> 
>     put uniDecode(tString,"UTF8") into tResult
> 
>     put CharToNum(tResult) into tNumberResult  -- seeking value 8220 in base 
> 10
> 
> end mouseUp
> 
> So do I have to knuckle down and start bit shifting?
> 
> thanks for reading
> Simon
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode 
> <http://lists.runrev.com/mailman/listinfo/use-livecode>

“
LEFT DOUBLE QUOTATION MARK
Unicode: U+201C, UTF-8: E2 80 9C

ƒ
LATIN SMALL LETTER F WITH HOOK
Unicode: U+0192, UTF-8: C6 92

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Livecode and UTF8

Reply via email to