Simon I did not reply sooner as I’m not such an experienced LiveCoder. Also, I am concentrating on learning LiveCode versions from 7 onwards. Handling Unicode is much, much easier in those versions.
In the code you posted here, you appear to have used uniDecode where you needed to use uniEncode. I ran this in the message box in LiveCode 8: put numToChar(226) into tString -- E2 hex put numToChar(128) after tString -- 80 hex put numToChar(156) after tString -- 9C hex put unicode uniEncode(tString, "UTF8”) It successfully displayed this: “ I am not familiar with using Unicode in the older versions but noticed that you typed “useUniCode” instead of “useUnicode” in the code example in your message. I hope this helps to get you started. Peter > On 29 Jul 2015, at 02:29, Simon Knight <si...@smknight.co.uk> wrote: > > Hi, > > I have an app that works with old fashioned text i.e. characters with a code > value of less than 128. Recently I enabled cut and paste and the app gets > confused it text is pasted in with character values > 127. I have done the > obvious and an filtering out all characters with an Ascii value of >127 but > in the longer term I want to convert a few high bit characters to a low bit > versions e.g. smart quotes to dumb quotes. > > Some of the text that gets pasted from my email client is in UTF8. I have > done some web research and now know a little about UTF8. I have written a > routine that captures any UTF8 code patterns and passes the UTF string to a > routine for conversion. > > A UTF8 string may be between one and four bytes long and every byte has a > value greater than 127. I wish to extract the UTF character value and use > the value to do the conversion. My question is: does livecode have any > method of converting a UTF8 character string to either a UTF16 string or to > the numeric value of the character which I believe is the same if leading > zeros are ignored? For instance a smart open quote appears in my data as a > series of three bytes : [E2-hex,80-hex,9C-hex] the numeric value of the > character is encoded within the bits of the three bytes and will take some > bit shifting to extract : the UTF8 string decodes to 201C-hex or 8220 base 10. > > At present I am working with Livecode 6.7 and have read about and tried the > uniEncode and uniDecode functions. The description of these functions does > not make any sense to me as they seem to be about adding or removing every > other byte which can't work with UTF8. > > I have tried various versions of the following button code attempting to get > a result of 8220 base 10: > > on mouseUp > put numToChar(226) into tString -- E2 hex > put numToChar(128) after tString -- 80 hex > put numToChar(156) after tString -- 9C hex > > Set the UseUniCode to true > > put "source string :" & tstring > > put uniDecode(tString,"UTF8") into tResult > > put CharToNum(tResult) into tNumberResult -- seeking value 8220 in base > 10 > > end mouseUp > > So do I have to knuckle down and start bit shifting? > > thanks for reading > Simon > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > <http://lists.runrev.com/mailman/listinfo/use-livecode> “ LEFT DOUBLE QUOTATION MARK Unicode: U+201C, UTF-8: E2 80 9C ƒ LATIN SMALL LETTER F WITH HOOK Unicode: U+0192, UTF-8: C6 92 _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode