Re: Putting Unicode characters in JSON

2018-03-24 Thread Peter J. Holzer
On 2018-03-25 06:30:54 +1100, Chris Angelico wrote: > On Sun, Mar 25, 2018 at 3:35 AM, Peter J. Holzer wrote: > > On 2018-03-24 11:21:09 +1100, Chris Angelico wrote: > >> If the database has been configured to use UTF-8 (as mentioned, that's > >> "utf8mb4" in MySQL), you won't get that byte sequen

Re: Putting Unicode characters in JSON

2018-03-24 Thread Chris Angelico
On Sun, Mar 25, 2018 at 3:35 AM, Peter J. Holzer wrote: > On 2018-03-24 11:21:09 +1100, Chris Angelico wrote: >> If the database has been configured to use UTF-8 (as mentioned, that's >> "utf8mb4" in MySQL), you won't get that byte sequence back. You'll get >> back valid UTF-8. > > Actually (with

Re: Putting Unicode characters in JSON

2018-03-24 Thread Peter J. Holzer
On 2018-03-24 11:21:09 +1100, Chris Angelico wrote: > On Sat, Mar 24, 2018 at 11:11 AM, Steven D'Aprano > wrote: > > On Fri, 23 Mar 2018 07:46:16 -0700, Tobiah wrote: > >> If I changed my database tables to all be UTF-8 would this work cleanly > >> without any decoding? > > > > Not reliably or saf

Re: Putting Unicode characters in JSON

2018-03-23 Thread Steven D'Aprano
On Sat, 24 Mar 2018 11:21:09 +1100, Chris Angelico wrote: >>> If I changed my database tables to all be UTF-8 would this work >>> cleanly without any decoding? >> >> Not reliably or safely. It will appear to work so long as you have only >> pure ASCII strings from the database, and then crash when

Re: Putting Unicode characters in JSON

2018-03-23 Thread Chris Angelico
On Sat, Mar 24, 2018 at 11:11 AM, Steven D'Aprano wrote: > On Fri, 23 Mar 2018 07:46:16 -0700, Tobiah wrote: > >> If I changed my database tables to all be UTF-8 would this work cleanly >> without any decoding? > > Not reliably or safely. It will appear to work so long as you have only > pure ASCI

Re: Putting Unicode characters in JSON

2018-03-23 Thread Steven D'Aprano
On Fri, 23 Mar 2018 07:46:16 -0700, Tobiah wrote: > If I changed my database tables to all be UTF-8 would this work cleanly > without any decoding? Not reliably or safely. It will appear to work so long as you have only pure ASCII strings from the database, and then crash when you don't: py> te

Re: Putting Unicode characters in JSON

2018-03-23 Thread Chris Angelico
On Sat, Mar 24, 2018 at 1:46 AM, Tobiah wrote: > On 03/22/2018 12:46 PM, Tobiah wrote: >> >> I have some mailing information in a Mysql database that has >> characters from various other countries. The table says that >> it's using latin-1 encoding. I want to send this data out >> as JSON. >> >>

Re: Putting Unicode characters in JSON

2018-03-23 Thread Grant Edwards
On 2018-03-23, Richard Damon wrote: > One comment on this whole argument, the original poster asked how to get > data from a database that WAS using Latin-1 encoding into JSON (which > wants UTF-8 encoding) and was asking if something needed to be done > beyond using .decode('Latin-1'), and in

Re: Putting Unicode characters in JSON

2018-03-23 Thread Grant Edwards
On 2018-03-23, Chris Angelico wrote: > On Fri, Mar 23, 2018 at 10:47 AM, Steven D'Aprano > wrote: >> On Fri, 23 Mar 2018 07:09:50 +1100, Chris Angelico wrote: >> I was reading though, that JSON files must be encoded with UTF-8. So should I be doing string.decode('latin-1').encode('utf-8

Re: Putting Unicode characters in JSON

2018-03-23 Thread Tobiah
On 03/22/2018 12:46 PM, Tobiah wrote: I have some mailing information in a Mysql database that has characters from various other countries.  The table says that it's using latin-1 encoding.  I want to send this data out as JSON. So I'm just taking each datum and doing 'name'.decode('latin-1') an

Re: Putting Unicode characters in JSON

2018-03-23 Thread Richard Damon
On 3/23/18 6:35 AM, Chris Angelico wrote: On Fri, Mar 23, 2018 at 9:29 PM, Steven D'Aprano wrote: On Fri, 23 Mar 2018 18:35:20 +1100, Chris Angelico wrote: That doesn't seem to be a strictly-correct Latin-1 decoder, then. There are a number of unassigned byte values in ISO-8859-1. That's inc

Re: Putting Unicode characters in JSON

2018-03-23 Thread Chris Angelico
On Fri, Mar 23, 2018 at 9:29 PM, Steven D'Aprano wrote: > On Fri, 23 Mar 2018 18:35:20 +1100, Chris Angelico wrote: > >> That doesn't seem to be a strictly-correct Latin-1 decoder, then. There >> are a number of unassigned byte values in ISO-8859-1. > > That's incorrect, but I don't blame you for

Re: Putting Unicode characters in JSON

2018-03-23 Thread Steven D'Aprano
On Fri, 23 Mar 2018 18:35:20 +1100, Chris Angelico wrote: > That doesn't seem to be a strictly-correct Latin-1 decoder, then. There > are a number of unassigned byte values in ISO-8859-1. That's incorrect, but I don't blame you for getting it wrong. Who thought that it was a good idea to disting

Re: Putting Unicode characters in JSON

2018-03-23 Thread Paul Moore
On 23 March 2018 at 00:27, Thomas Jollans wrote: > On 22/03/18 20:46, Tobiah wrote: >> I was reading though, that JSON files must be encoded with UTF-8. So >> should I be doing string.decode('latin-1').encode('utf-8')? Or does >> the json module do that for me when I give it a unicode object? >

Re: Putting Unicode characters in JSON

2018-03-23 Thread Chris Angelico
On Fri, Mar 23, 2018 at 4:35 PM, Steven D'Aprano wrote: > On Fri, 23 Mar 2018 12:05:34 +1100, Chris Angelico wrote: > >> Latin-1 is not "arbitrary bytes". It is a very specific encoding that >> cannot decode every possible byte value. > > Yes it can. > > py> blob = bytes(range(256)) > py> len(blob

Re: Putting Unicode characters in JSON

2018-03-22 Thread Steven D'Aprano
On Fri, 23 Mar 2018 12:05:34 +1100, Chris Angelico wrote: > Latin-1 is not "arbitrary bytes". It is a very specific encoding that > cannot decode every possible byte value. Yes it can. py> blob = bytes(range(256)) py> len(blob) 256 py> blob[45:55] b'-./0123456' py> s = blob.decode('latin1') py>

Re: Putting Unicode characters in JSON

2018-03-22 Thread Chris Angelico
On Fri, Mar 23, 2018 at 11:39 AM, Steven D'Aprano wrote: > On Fri, 23 Mar 2018 11:08:56 +1100, Chris Angelico wrote: >> Okay. Give me a good reason for the database itself to be locked to >> Latin-1. Make sure you explain how potentially saving the occasional >> byte of storage (compared to UTF-8)

Re: Putting Unicode characters in JSON

2018-03-22 Thread Steven D'Aprano
On Fri, 23 Mar 2018 11:08:56 +1100, Chris Angelico wrote: > On Fri, Mar 23, 2018 at 10:47 AM, Steven D'Aprano > wrote: >> On Fri, 23 Mar 2018 07:09:50 +1100, Chris Angelico wrote: >> I was reading though, that JSON files must be encoded with UTF-8. So should I be doing string.decode('l

Re: Putting Unicode characters in JSON

2018-03-22 Thread Chris Angelico
On Fri, Mar 23, 2018 at 11:39 AM, Ben Finney wrote: > Chris Angelico writes: > >> There is NOT always a good reason for a suboptimal configuration. > > True. Did anyone claim otherwise? > > What I saw Steven responding to was your claim that there is *never* a > good reason to do it. > > To refut

Re: Putting Unicode characters in JSON

2018-03-22 Thread Thomas Jollans
On 22/03/18 20:46, Tobiah wrote: > I was reading though, that JSON files must be encoded with UTF-8.  So > should I be doing string.decode('latin-1').encode('utf-8')?  Or does > the json module do that for me when I give it a unicode object? Definitely not. In fact, that won't even work. >>> impo

Re: Putting Unicode characters in JSON

2018-03-22 Thread Ben Finney
Chris Angelico writes: > There is NOT always a good reason for a suboptimal configuration. True. Did anyone claim otherwise? What I saw Steven responding to was your claim that there is *never* a good reason to do it. To refute that, it's sufficient to show that good reason can exist in some c

Re: Putting Unicode characters in JSON

2018-03-22 Thread Chris Angelico
On Fri, Mar 23, 2018 at 11:25 AM, Ben Finney wrote: > Chris Angelico writes: > >> On Fri, Mar 23, 2018 at 10:47 AM, Steven D'Aprano >> wrote: >> > On Fri, 23 Mar 2018 07:09:50 +1100, Chris Angelico wrote: >> >> Reconfigure your MySQL database to use UTF-8. There is no reason to >> >> use Latin-1

Re: Putting Unicode characters in JSON

2018-03-22 Thread Ben Finney
Chris Angelico writes: > On Fri, Mar 23, 2018 at 10:47 AM, Steven D'Aprano > wrote: > > On Fri, 23 Mar 2018 07:09:50 +1100, Chris Angelico wrote: > >> Reconfigure your MySQL database to use UTF-8. There is no reason to > >> use Latin-1 in the database. > > > > You don't know that. You don't know

Re: Putting Unicode characters in JSON

2018-03-22 Thread Chris Angelico
On Fri, Mar 23, 2018 at 10:47 AM, Steven D'Aprano wrote: > On Fri, 23 Mar 2018 07:09:50 +1100, Chris Angelico wrote: > >>> I was reading though, that JSON files must be encoded with UTF-8. So >>> should I be doing string.decode('latin-1').encode('utf-8')? Or does >>> the json module do that for

Re: Putting Unicode characters in JSON

2018-03-22 Thread Steven D'Aprano
On Fri, 23 Mar 2018 07:09:50 +1100, Chris Angelico wrote: >> I was reading though, that JSON files must be encoded with UTF-8. So >> should I be doing string.decode('latin-1').encode('utf-8')? Or does >> the json module do that for me when I give it a unicode object? > > Reconfigure your MySQL

Re: Putting Unicode characters in JSON

2018-03-22 Thread Tobiah
On 03/22/2018 01:09 PM, Chris Angelico wrote: On Fri, Mar 23, 2018 at 6:46 AM, Tobiah wrote: I have some mailing information in a Mysql database that has characters from various other countries. The table says that it's using latin-1 encoding. I want to send this data out as JSON. So I'm jus

Re: Putting Unicode characters in JSON

2018-03-22 Thread Chris Angelico
On Fri, Mar 23, 2018 at 6:46 AM, Tobiah wrote: > I have some mailing information in a Mysql database that has > characters from various other countries. The table says that > it's using latin-1 encoding. I want to send this data out > as JSON. > > So I'm just taking each datum and doing 'name'.d

Putting Unicode characters in JSON

2018-03-22 Thread Tobiah
I have some mailing information in a Mysql database that has characters from various other countries. The table says that it's using latin-1 encoding. I want to send this data out as JSON. So I'm just taking each datum and doing 'name'.decode('latin-1') and adding the resulting Unicode value ri