> True for yourself, I assume. But json.dumps is *not* why *the rest of > us* do that. We do it because we've *always* done it. The Python > objects we are serializing themselves lack units, precision, and pet's > name! Until our Python programs become unit- and precision- aware, > support for "lossless JSON" is necessarily going to be idiosyncratic, > and mostly avoided.
As a more "casual" user of JSON, this part in particular resonated with me. For the majority of use cases, I can't imagine that most users have a need for the degree of precision desired by the OP of the previous topic. As far as I'm aware ``json.dumps()`` serves adequately as a JSON stream for most users. For context, a recent JSON file that I created using the json module: https://github.com/python/devguide/pull/517. It's quite simple and has a very small number of data types. > 1. Of those who don't know, how many have need to know, and will > acknowledge that eed? (If they don't admit it, good luck getting > them to change their programs!) Prior to this discussion, I'll admit that I had no idea about decimals converting to floats when JSONs were deserialized. But as stated in the question above, I really never had a need to know. For pretty much any time I've ever used JSON, floats provided a "good enough" degree of precision. I'm not saying that just because it isn't useful for me personally that it shouldn't be added, but I wouldn't be surprised if the majority of users of the json module had no idea about this issue because it didn't affect them significantly. > For almost all Python applications, a JSON-LD @context specific to > Python's object model and standard builtin types would be enough. I'm personally not knowledgeable enough about JSON-LD, I've only seen it mentioned before a few times. But, based on what I can tell from the examples on https://json-ld.org/playground/ (the "Place" example was particularly helpful), I could definitely imagine @context being useful. Speaking of examples, I think it would be helpful to provide a brief example of ``object.__json__()`` being used for some added clarity. Would it be a direct alias of ``json.dumps()`` with the same exact parameters and usage, or would there be some substantial differences? Thanks, Kyle Stanley On Fri, Aug 16, 2019 at 4:00 AM Stephen J. Turnbull < [email protected]> wrote: > Wes Turner writes: > > > Data interchange with structured types is worthwhile. > > That's not what the main thread is about. It's about adding support > for Decimal to the stdlib's json module. Even the OP has explicitly > disclaimed pretty much everything else, although his preferred > implementation is more general than that. > > I'm +1 on that. I think the outline of how to do it has become pretty > obvious, and that it should be restricted to automatically converting > Decimals to a JSON number, perhaps under control of a use_decimal flag > for both encoding and decoding. > > The rest should go into a separate thread. First let's dispose of > this: > > > Streaming JSON is not possible without JSON lines support. > > It is obvious to me that this should be handled in yet another thread > from "lossless JSON", because it can and should be independently > implemented, if it's done at all. Given (ob,n) = raw_decode(idx=n) > support in the json module, the difficulty in implementing is all > about buffering, and choosing where to do that buffering (in a > separate module? in json.load? in a new json.load_stream generator?) > > I will now argue that the __json__ protocol is nowhere near so > obviously stdlib-able as Decimal and streaming JSON. > > > An object.__json__(**kwargs) protocol would inconvenience no-one so > > long as: > > - decimal isn't imported unless used > > - all existing code continues to work > > I also think that JSON is widely enough used, and deserves better > semantic support, that a protocol (specifically, the __json__ dunder) > for serializer support and some form of complementary deserializer > support are quite justifiable. But the __json__ dunder is the *easy* > part. The complexity here is in that complementary deserializer. > > Here's why. To your desiderata I would add > > - no complex type's module is imported unless used (easy) > > - the deserializer support for a type should be linked to its > serializer support (something like the codecs registry, but more > complicated because each entity will need to invoke support > separately, unlike codecs where there's one codec for a whole text) > > - such object support should be automatically linked in to both the > top level serializer and deserializer dispatching. > > The latter two desiderata look *hard* to me. Without them, you've got > the inverse of the current Decimal problem. This is going to require > that somebody or somebodies spend many person-hours on design, > implementation, and testing. Also > > - the deserializer support may or may not want to be in json.loads() > > because it may be preferable to deserialize to the primitive Python > objects that correspond to the JSON types, and then allow the Python > program to flexibly handle those. Eg, what to do about variable > annotations? Should our deserializer automatically deal with those? > What if a variable's value conflicts with its annotation? While there > may be a clear answer to this question after somebody has thought > about it for a bit, it's not obvious to me. > > The fundamental problem with your overall argument is that the > usefulness to the community at large is unclear: > > > It is unfortunate that we all just use JSON and throw away decimals > > and float precision and datetimes because json.dumps is so easy. > > True for yourself, I assume. But json.dumps is *not* why *the rest of > us* do that. We do it because we've *always* done it. The Python > objects we are serializing themselves lack units, precision, and pet's > name! Until our Python programs become unit- and precision- aware, > support for "lossless JSON" is necessarily going to be idiosyncratic, > and mostly avoided. > > > How many people know that: > > > > - You can or should use decimal to avoid float precision error, but then > > you have to annoyingly write a JSONEncoder to save that data, and then > the > > type is lost when it's parsed and cast to a float when it's > deserialized? > > > > - JSON-LD is the only non-ad-hoc solution to preserving precision, > > datetimes, and complex numbers and types with JSON > > > > - JSON5 supports IEEE 754 ±Infinity and NaN > > > > - Pickles do serialize arbitrary objects, but are not safe for data > > publishing because unmarshalling runs executable code in the pickle > (this > > is in the docs now) > > Very few. But again, that's the wrong set of questions, for reasons > similar to the above issue about "why we use json.dumps". The right > questions are: > > 1. Of those who don't know, how many have need to know, and will > acknowledge that eed? (If they don't admit it, good luck getting > them to change their programs!) > > 2. Of those who have need to know, how many would have "enough" of > their serialization problems solved by any particular packaged set > of features that might be added to the stdlib? > > 3. Is the number of programs in 2 "large enough" to justify the > additional maintenance burden and the risk that better but > conflicting solutions will be created in the future? > > > JSON-LD is the way to go for complex types in JSON. > > > It's worth specifying a JSON serialization protocol as a PEP that > > third-party and stdlib JSON implementations would use. > > All of JSON-LD is way overkill for the examples of complex types > you've given. We *do not need or want* a complete reimplementation > of the Semantic Web in JSON in our stdlib. So what exactly are you > talking about? Here's my idea: > > I suspect your "serialization protocol" above really means > *deserialization* protocol. object.__json__ is all the serialization > protocol we need, because it will produce a standard JSON stream that > can be deserialized (perhaps with different semantics!) by any > standard JSON deserializer. Also, we don't need a PEP to specify the > protocol for providing a more accurate deserialization, JSON-LD > already did that work, and the parts we need are pretty trivial > (definitely @context, maybe @id). So I interpret your word "protocol" > to mean "JSON-LD @context". Is that close? > > For almost all Python applications, a JSON-LD @context specific to > Python's object model and standard builtin types would be enough. > Since each type is itself a Python object, JSON-LD should be able to > represent user-defined classes and their instances within that > @context too. For those programs that provide more semantic > information about their classes, they'd need additional, idiosyncratic > @context anyway, and I have no idea what a "standard extended > @context" would want to include. Each large external package (NumPy, > Twisted) would want to implement its own @context, I think. > > We could imagine additional semantic information in this @context that > would even tell you which modules you need to pip from PyPI to work > with these data types, along with the developers' and auditors'[1] > signatures you can authenticate the module and apply your trust model > to whether you want to import them. > > Steve > > Footnotes: > [1] Is this new? I know that frequently software modules are signed > by their maintainers, and people decide to extend trust to particular > maintainers. But in open source, anybody can audit, so a list of > auditors with signatures, dates, and a comment field for the audit > might also be useful for maintainers who aren't famous when the > auditors are famous. > > Steve > _______________________________________________ > Python-ideas mailing list -- [email protected] > To unsubscribe send an email to [email protected] > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/[email protected]/message/XACTLM5TXCKM2MAZM4BKN677M2DU46QA/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/[email protected]/message/A4KH7SYKG2IXGGRDWGVNBYP2BXXO7XIN/ Code of Conduct: http://python.org/psf/codeofconduct/
