On 2020-07-07, Stephen Rosen wrote: > On Mon, Jul 6, 2020 at 6:37 AM Adam Funk <a24...@ducksburg.com> wrote: > >> Is there a "bulletproof" version of json.dump somewhere that will >> convert bytes to str, any other iterables to list, etc., so you can >> just get your data into a file & keep working? >> > > Is the data only being read by python programs? If so, consider using > pickle: https://docs.python.org/3/library/pickle.html > Unlike json dumping, the goal of pickle is to represent objects as exactly > as possible and *not* to be interoperable with other languages. > > > If you're using json to pass data between python and some other language, > you don't want to silently convert bytes to strings. > If you have a bytestring of utf-8 data, you want to utf-8 decode it before > passing it to json.dumps. > Likewise, if you have latin-1 data, you want to latin-1 decode it. > There is no universal and correct bytes-to-string conversion. > > On Mon, Jul 6, 2020 at 9:45 AM Chris Angelico <ros...@gmail.com> wrote: > >> Maybe what we need is to fork out the default JSON encoder into two, >> or have a "strict=True" or "strict=False" flag. In non-strict mode, >> round-tripping is not guaranteed, and various types will be folded to >> each other - mainly, many built-in and stdlib types will be >> represented in strings. In strict mode, compliance with the RFC is >> ensured (so ValueError will be raised on inf/nan), and everything >> should round-trip safely. >> > > Wouldn't it be reasonable to represent this as an encoder which is provided > by `json`? i.e. > > from json import dumps, UnsafeJSONEncoder > ... > json.dumps(foo, cls=UnsafeJSONEncoder) > > Emphasizing the "Unsafe" part of this and introducing people to the idea of > setting an encoder also seems nice. > > > On Mon, Jul 6, 2020 at 9:12 AM Chris Angelico <ros...@gmail.com> wrote: > >> On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list >> <python-list@python.org> wrote: >> > > >> The 'json' module already fails to provide round-trip functionality: >> > >> > >>> for data in ({True: 1}, {1: 2}, (1, 2)): >> > ... if json.loads(json.dumps(data)) != data: >> > ... print('oops', data, json.loads(json.dumps(data))) >> > ... >> > oops {True: 1} {'true': 1} >> > oops {1: 2} {'1': 2} >> > oops (1, 2) [1, 2] >> >> There's a fundamental limitation of JSON in that it requires string >> keys, so this is an obvious transformation. I suppose you could call >> that one a bug too, but it's very useful and not too dangerous. (And >> then there's the tuple-to-list transformation, which I think probably >> shouldn't happen, although I don't think that's likely to cause issues >> either.) > > > Ideally, all of these bits of support for non-JSON types should be opt-in, > not opt-out. > But it's not worth making a breaking change to the stdlib over this. > > Especially for new programmers, the notion that > deserialize(serialize(x)) != x > just seems like a recipe for subtle bugs. > > You're never guaranteed that the deserialized object will match the > original, but shouldn't one of the goals of a de/serialization library be > to get it as close as is reasonable? > > > I've seen people do things which boil down to > > json.loads(x)["some_id"] == UUID(...) > > plenty of times. It's obviously wrong and the fix is easy, but isn't making > the default json encoder less strict just encouraging this type of bug? > > Comparing JSON data against non-JSON types is part of the same category of > errors: conflating JSON with dictionaries. > It's very easy for people to make this mistake, especially since JSON > syntax is a subset of python dict syntax, so I don't think `json.dumps` > should be encouraging it. > > On Tue, Jul 7, 2020 at 6:52 AM Adam Funk <a24...@ducksburg.com> wrote: > >> Here's another "I'd expect to have to deal with this sort of thing in >> Java" example I just ran into: >> >> >>> r = requests.head(url, allow_redirects=True) >> >>> print(json.dumps(r.headers, indent=2)) >> ... >> TypeError: Object of type CaseInsensitiveDict is not JSON serializable >> >>> print(json.dumps(dict(r.headers), indent=2)) >> { >> "Content-Type": "text/html; charset=utf-8", >> "Server": "openresty", >> ... >> } >> > > Why should the JSON encoder know about an arbitrary dict-like type? > It might implement Mapping, but there's no way for json.dumps to know that > in the general case (because not everything which implements Mapping > actually inherits from the Mapping ABC). > Converting it to a type which json.dumps understands is a reasonable > constraint. > > Also, wouldn't it be fair, if your object is "case insensitive" to > serialize it as > { "CONTENT-TYPE": ... } or { "content-type": ... } or ... > ? > > `r.headers["content-type"]` presumably gets a hit. > `json.loads(json.dumps(dict(r.headers)))["content-type"]` will get a > KeyError. > > This seems very much out of scope for the json package because it's not > clear what it's supposed to do with this type. > Libraries should ask users to specify what they mean and not make > potentially harmful assumptions.
I see what you mean. I guess it just bugs me to have to do all this explicit type conversion (when I'm not using Java!). -- A drug is not bad. A drug is a chemical compound. The problem comes in when people who take drugs treat them like a license to behave like an asshole. ---Frank Zappa -- https://mail.python.org/mailman/listinfo/python-list