On Sun, Jan 22, 2017 at 1:26 AM, Peter TB Brett via use-livecode < use-livecode@lists.runrev.com> wrote:
> On 22/01/2017 03:22, J. Landman Gay via use-livecode wrote: > >> Here's a test sample of some UTF8 I get back from a server: >> >> {"UserID":48,"UserName":"Eduardo Ba\u00f1uls","UserLoginName":"ebanu"} > > > This is valid JSON (and also valid ASCII). In JSON, any character in a > string may be encoded in the form \uXXXX where XXXX is the 4-digit > hexadecimal representation of a Unicode codepoint. No textDecode() > operation is required. > > JSONImport() handles this correctly. > Let me add a warning about this. I was going to write this in my original response but I was writing on my phone when I responded. While it is true that you don't need to run textDecode() on the JSON and that JSONImprt() will handle it correctly, you may want to do it anyway if the JSON is coming from a web application with a Content-Type header that says it is UTF-8 encoded. I say that based upon the two days my team and I spent troubleshooting a related issue just two weeks ago. We recently completed a major update of our web application which involved upgrading the version of Ruby on Rails that we use. We switch from version 3 to version 4. The `#to_json` method in Ruby on Rails 3 returned JSON data that was encoded using escape sequences (\uXXXX) with the header `Content-Type: application/json; charset=utf-8`. As Peter points out, the data is actually valid ASCII. When I originally wrote the LiveCode code that parsed the JSON responses I used JSONImport() and everything worked as I expected. Based on the UTF-8 headers I was seeing and the fact that my code was handling unicode characters properly, I assumed that libURL was transparently decoding the data returned by a url in LC >= 7. After deploying our app upgrade two weeks ago we started getting reports of weird characters showing up in the desktop software. What we found out is that the `#to_json` method in Ruby on Rails 4 stopped using escape sequences and started returning UTF-8 encoded JSON. It turns out that libURL only decodes SOME of data that passes through it [1]. Since my LiveCode app was not decoding the UTF-8 data that the server was returning to the internal LiveCode text format any characters with a codepoint > 127 would display as an odd character in our app. Tracking down the cause took a lot of time. We added a monkey patch on our server that switched back to returning the JSON to the escape sequence format. That way existing desktop software would continue to work. I've updated the custom version of libURL that we use so that it looks at the Content-Type header and runs textDecode() on data passing through it. We will remove the monkey patch on the server after we have pushed the update desktop software out to everyone. Moral of the story for people using LC >= 7? If the Content-Type header of your JSON responses says it is supposed to be UTF-8 encoded then decode the data when it comes into LiveCode. That way you know it is in the correct encoding before you start working with it. [1] There is a ulDecodeData() function in versions of libURL that ship with LC >= 7. It will decode HTML data with a <meta charset) header. The function ignores the Content-Type header. See bug report with a proposed solution: http://quality.livecode.com/show_bug.cgi?id=19085. -- Trevor DeVore Outcome & ScreenSteps www.outcomeapp.io - www.screensteps.com _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode