[Python-ideas] Re: JSON encoder protocol

Kyle Stanley Fri, 16 Aug 2019 20:38:40 -0700

 > True for yourself, I assume.  But json.dumps is *not* why *the rest of
> us* do that.  We do it because we've *always* done it.  The Python
> objects we are serializing themselves lack units, precision, and pet's
> name!  Until our Python programs become unit- and precision- aware,
> support for "lossless JSON" is necessarily going to be idiosyncratic,
> and mostly avoided.


As a more "casual" user of JSON, this part in particular resonated with me.
For the majority of use cases, I can't imagine that most users have a need
for the degree of precision desired by the OP of the previous topic. As far
as I'm aware ``json.dumps()`` serves adequately as a JSON stream for most
users. For context, a recent JSON file that I created using the json
module: https://github.com/python/devguide/pull/517. It's quite simple and
has a very small number of data types.

> 1.  Of those who don't know, how many have need to know, and will
>     acknowledge that eed?  (If they don't admit it, good luck getting
>     them to change their programs!)

Prior to this discussion, I'll admit that I had no idea about decimals
converting to floats when JSONs were deserialized. But as stated in the
question above, I really never had a need to know. For pretty much any time
I've ever used JSON, floats provided a "good enough" degree of precision.

I'm not saying that just because it isn't useful for me personally that it
shouldn't be added, but I wouldn't be surprised if the majority of users of
the json module had no idea about this issue because it didn't affect them
significantly.

> For almost all Python applications, a JSON-LD @context specific to
> Python's object model and standard builtin types would be enough.

I'm personally not knowledgeable enough about JSON-LD, I've only seen it
mentioned before a few times. But, based on what I can tell from the
examples on https://json-ld.org/playground/ (the "Place" example was
particularly helpful), I could definitely imagine @context being useful.

Speaking of examples, I think it would be helpful to provide a brief
example of ``object.__json__()`` being used for some added clarity. Would
it be a direct alias of ``json.dumps()`` with the same exact parameters and
usage, or would there be some substantial differences?

Thanks,
Kyle Stanley

On Fri, Aug 16, 2019 at 4:00 AM Stephen J. Turnbull <
[email protected]> wrote:

> Wes Turner writes:
>
>  > Data interchange with structured types is worthwhile.
>
> That's not what the main thread is about.  It's about adding support
> for Decimal to the stdlib's json module.  Even the OP has explicitly
> disclaimed pretty much everything else, although his preferred
> implementation is more general than that.
>
> I'm +1 on that.  I think the outline of how to do it has become pretty
> obvious, and that it should be restricted to automatically converting
> Decimals to a JSON number, perhaps under control of a use_decimal flag
> for both encoding and decoding.
>
> The rest should go into a separate thread.  First let's dispose of
> this:
>
>  > Streaming JSON is not possible without JSON lines support.
>
> It is obvious to me that this should be handled in yet another thread
> from "lossless JSON", because it can and should be independently
> implemented, if it's done at all.  Given (ob,n) = raw_decode(idx=n)
> support in the json module, the difficulty in implementing is all
> about buffering, and choosing where to do that buffering (in a
> separate module? in json.load? in a new json.load_stream generator?)
>
> I will now argue that the __json__ protocol is nowhere near so
> obviously stdlib-able as Decimal and streaming JSON.
>
>  > An object.__json__(**kwargs) protocol would inconvenience no-one so
>  > long as:
>  > - decimal isn't imported unless used
>  > - all existing code continues to work
>
> I also think that JSON is widely enough used, and deserves better
> semantic support, that a protocol (specifically, the __json__ dunder)
> for serializer support and some form of complementary deserializer
> support are quite justifiable.  But the __json__ dunder is the *easy*
> part.  The complexity here is in that complementary deserializer.
>
> Here's why.  To your desiderata I would add
>
> - no complex type's module is imported unless used (easy)
>
> - the deserializer support for a type should be linked to its
>   serializer support (something like the codecs registry, but more
>   complicated because each entity will need to invoke support
>   separately, unlike codecs where there's one codec for a whole text)
>
> - such object support should be automatically linked in to both the
>   top level serializer and deserializer dispatching.
>
> The latter two desiderata look *hard* to me.  Without them, you've got
> the inverse of the current Decimal problem.  This is going to require
> that somebody or somebodies spend many person-hours on design,
> implementation, and testing.  Also
>
> - the deserializer support may or may not want to be in json.loads()
>
> because it may be preferable to deserialize to the primitive Python
> objects that correspond to the JSON types, and then allow the Python
> program to flexibly handle those.  Eg, what to do about variable
> annotations?  Should our deserializer automatically deal with those?
> What if a variable's value conflicts with its annotation?  While there
> may be a clear answer to this question after somebody has thought
> about it for a bit, it's not obvious to me.
>
> The fundamental problem with your overall argument is that the
> usefulness to the community at large is unclear:
>
>  > It is unfortunate that we all just use JSON and throw away decimals
>  > and float precision and datetimes because json.dumps is so easy.
>
> True for yourself, I assume.  But json.dumps is *not* why *the rest of
> us* do that.  We do it because we've *always* done it.  The Python
> objects we are serializing themselves lack units, precision, and pet's
> name!  Until our Python programs become unit- and precision- aware,
> support for "lossless JSON" is necessarily going to be idiosyncratic,
> and mostly avoided.
>
>  > How many people know that:
>  >
>  > - You can or should use decimal to avoid float precision error, but then
>  > you have to annoyingly write a JSONEncoder to save that data, and then
> the
>  > type is lost when it's parsed and cast to a float when it's
> deserialized?
>  >
>  > - JSON-LD is the only non-ad-hoc solution to preserving precision,
>  > datetimes, and complex numbers and types with JSON
>  >
>  > - JSON5 supports IEEE 754 ±Infinity and NaN
>  >
>  > - Pickles do serialize arbitrary objects, but are not safe for data
>  > publishing because unmarshalling runs executable code in the pickle
> (this
>  > is in the docs now)
>
> Very few.  But again, that's the wrong set of questions, for reasons
> similar to the above issue about "why we use json.dumps".  The right
> questions are:
>
> 1.  Of those who don't know, how many have need to know, and will
>     acknowledge that eed?  (If they don't admit it, good luck getting
>     them to change their programs!)
>
> 2.  Of those who have need to know, how many would have "enough" of
>     their serialization problems solved by any particular packaged set
>     of features that might be added to the stdlib?
>
> 3.  Is the number of programs in 2 "large enough" to justify the
>     additional maintenance burden and the risk that better but
>     conflicting solutions will be created in the future?
>
>  > JSON-LD is the way to go for complex types in JSON.
>
>  > It's worth specifying a JSON serialization protocol as a PEP that
>  > third-party and stdlib JSON implementations would use.
>
> All of JSON-LD is way overkill for the examples of complex types
> you've given.  We *do not need or want* a complete reimplementation
> of the Semantic Web in JSON in our stdlib.  So what exactly are you
> talking about?  Here's my idea:
>
> I suspect your "serialization protocol" above really means
> *deserialization* protocol.  object.__json__ is all the serialization
> protocol we need, because it will produce a standard JSON stream that
> can be deserialized (perhaps with different semantics!) by any
> standard JSON deserializer.  Also, we don't need a PEP to specify the
> protocol for providing a more accurate deserialization, JSON-LD
> already did that work, and the parts we need are pretty trivial
> (definitely @context, maybe @id).  So I interpret your word "protocol"
> to mean "JSON-LD @context".  Is that close?
>
> For almost all Python applications, a JSON-LD @context specific to
> Python's object model and standard builtin types would be enough.
> Since each type is itself a Python object, JSON-LD should be able to
> represent user-defined classes and their instances within that
> @context too.  For those programs that provide more semantic
> information about their classes, they'd need additional, idiosyncratic
> @context anyway, and I have no idea what a "standard extended
> @context" would want to include.  Each large external package (NumPy,
> Twisted) would want to implement its own @context, I think.
>
> We could imagine additional semantic information in this @context that
> would even tell you which modules you need to pip from PyPI to work
> with these data types, along with the developers' and auditors'[1]
> signatures you can authenticate the module and apply your trust model
> to whether you want to import them.
>
> Steve
>
> Footnotes:
> [1]  Is this new?  I know that frequently software modules are signed
> by their maintainers, and people decide to extend trust to particular
> maintainers.  But in open source, anybody can audit, so a list of
> auditors with signatures, dates, and a comment field for the audit
> might also be useful for maintainers who aren't famous when the
> auditors are famous.
>
> Steve
> _______________________________________________
> Python-ideas mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/XACTLM5TXCKM2MAZM4BKN677M2DU46QA/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/A4KH7SYKG2IXGGRDWGVNBYP2BXXO7XIN/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: JSON encoder protocol

Reply via email to