[Python-ideas] Re: adding support for a "raw output" in JSON serializer

Richard Musil Thu, 08 Aug 2019 22:58:45 -0700

After some thinking it seems that the float is only case where this "loss of a 
precision" happen.


With the current implementation (in the standard module) of how 
`JSONEncoder.default`, I believe I can serialize pretty much everything else. 
The "composite" types (i.e. the custom types in the sense of JSON spec) can be 
serialized either as special string or a map and the logic implemented on both 
ends on the application level.

The only possible suspects to suffer from the current `JSONEncoder.default` are 
the "integral" types in JSON, i.e. integer, float and boolean (null is in this 
sense "equivalent" to boolean). The boolean should not need any custom encoder 
as it can only be represented as one of well defined set of representation.

The integer might suffer the similar fate as a float if there was not a native 
support in Python for big int. With that I can do this and it works as expected:
```
json:orig = {"val": 
10000000000000000000000000000000000000000000000000000000000001}
json:pyth = {'val': 
10000000000000000000000000000000000000000000000000000000000001}
json:seri = {"val": 
10000000000000000000000000000000000000000000000000000000000001}
```
This is an integer value far exceeding the standard binary representation in 
64bit CPU arch.
```
In [12]: hex(10000000000000000000000000000000000000000000000000000000000001)
Out[12]: '0x63917877cec0556b21269d695bdcbf7a87aa000000000000001'
```
So with an integer, Python, thanks to its internal handling, parses the int 
correctly and "silently" upgrades it to big int, so it does not lose the 
precision (or bit-to-bit/byte-to-byte accuracy).

The only type "left in the dark" is the float. It does have an equivalent of 
integer's big int (decimal.Decimal) but it is not automatically applied, which 
is perfectly reasonable, because it would involve a custom type, and probably 
not many would want/need that.

On the other hand, being aware of the problem, it offers the famous 
`parse_float` keyword argument, which can just be conveniently set 
`decimal.Decimal` if the user needs an equivalent to the big int for the float. 
So far this also seem well thought out, because it shows:

a) the decoder (or better say its implementer) was well aware of the float 
properties in JSON input and wanted to give the user a way to handle it in 
their own way. It looks better than simplejson's `use_decimal`, because this 
one implies one particular type only can be use. While the standard module 
leaves the choice to the user. On the other hand in order to implement it 
efficiently they both (standard module and simplejson) made this option an 
explicit argument which only concerns the float type, so it does not need any 
"generic raw" decoder infrastructure to support it.

So far the way standard module handles that makes perfect sense.

Now for the encoder part. simplejson got away with `use_decimal` again, because 
it allowed `Decimal` as the only option. Standard module would need a way to 
identify the custom codec for the float to serialize "properly".

I can see two ways out of it:

1) The standard module could implement something like `dump_float` keyword 
argument in its `dump`, which would allow the user to specify which custom type 
he/she used for the float in the load and then the standard encoder will mark 
that and will honor the string representation of this object/type as the _raw_ 
output, either when internally converts the object (possibly by doing something 
like str(o), or when the custom implementation of JSONEncoder.default returns 
the string.

2) It would implement some specific semantics in the handling of 
JSONEncoder.default output which would allow user to signal to the underlying 
layer that it needs to output "raw" data to the output stream from the custom 
encoder without a need for the keyword argument. Using `bytes` object could be 
that trigger:
```
class DecimalEncoder(json.JSONEncoder):
    def default(self, o):
        print(o)
        if isinstance(o, decimal.Decimal):
            return str(o).encode()
        return super.default(o)
```

Any thoughts on this?
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/W2GRPMBJZQ7MZLD3DUTRPFVZCBV7FPVD/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: adding support for a "raw output" in JSON serializer

Reply via email to