Hi Mark,

Thanks for the reply.  The problem is

a) I want to do this purely in script

b) A character directly entered into the script on a Mac comes out different on Windows (i.e. the scripts don't know what character set they're in; they're simply stored with no indication of character set, and on every platform they're interpreted as the supposedly 'native' platform for that character set).

Presumably in 7.0 I won't even need to use normaliseText, because the scripts will themselves be stored in Unicode or UTF8, and therefore I can use any Unicode character in a real script constant. But not in 6.x.

Ben

On 30/06/2014 16:09, Mark Schonewille wrote:
Hi Ben,

The apostrophe doesn't work because you convert to ASCII text that looks 
different on different platforms. If you don't use unidecode and just set the 
unicodeText of a field to your Unicode string, it should work. If that's not 
practical, you could use macToIso() to convert your string to Latin-1.

--
Kind regards,

Mark Schonewille
Economy-x-Talk
Http://economy-x-talk.com

Share the clipboard of your computer over a local network with Clipboard Link 
http://clipboardlink.economy-x-talk.com


Op 30 jun. 2014 om 16:38 heeft Ben Rubinstein <benr...@cogapp.com> het volgende 
geschreven:

I think this problem should be solved in LC 7 (possibly using normaliseText); 
but I need a solution that I can ship now (and it's been threatened that LC 7 
will 'fix' a 'bug' which isn't, so I'm not sure if I'll ever able to use it).

My app processes some data from - and then, re-organised, to - UTF8 text files. 
Occasionally it needs to insert a constant string; and for various reasons (all 
of them excellent) I want to specify these constant strings in the script.  So 
far, so good.  Now however one of these constant strings needs to contain a 
character which is not in ASCII.  Actually two of them.  So I need to express a 
UTF8 string in my script.  And I'm searching for an elegant way to do this.

My constant string used to look something like this:

   constant kMyConstantString = "This is my ice cream"

but now it needs to read something like
   constant kMyConstantString = "This ice cream is (c) Ben and Jerry's Inc"

(only with a smart apostrophe and a proper copyright symbol).

I thought I could just about manage with this

  put uniDecode(uniEncode("This ice cream is © Ben and Jerry’s Inc, "ANSI"), 
"UTF8") into kMyConstantString

(that is, encode from ANSI to Unicode, then from Unicode into UTF8).

I tested it on Mac and it seemed to work.  The UTF8 file was generated and this 
text came out just right.


However, it turned out that when the code was compiled and run on Windows, the 
copyright symbol came out OK, but the apostrophe came out as o-tilde.

This is because uniEncode(..., "ANSI") is a lie; "ANSI" is meaningless; instead 
it interprets the source encoding as whatever is typical for the operating system.  I wrote the 
script on Mac; in MacRoman, © is 0xA9 and smart apostrophe is 0xD5; in ISO-8859-1 (and UTF8), 0xA9 
is ©, but 0xD5 is o-tilde.

So... what's the most elegant way to this (is there one)?  Is there any 
alternative to just looking up the UTF8 encodings and writing:

  put format("This ice cream is \xC2\xA9 Ben and Jerry\xE2\x80\x99s Inc") into 
kMyConstantString

?

TIA,

Ben

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to