*I debated posting here, or straight to GitHub. If that's the better place, 
I can move the thread there. * I have long wanted proper streaming support 
in the `encoding/json` library. Lately I’ve been doing some digging to 
understand the current state of things, and I think I’ve come to grips with 
most of it.

A number of previous issues relate to this topic: 
https://github.com/golang/go/issues/7872, 
https://github.com/golang/go/issues/11046, 
https://github.com/golang/go/issues/12001, 
https://github.com/golang/go/issues/14140

I have read through each of these issues, and believe I have a fair 
understanding of the problems associated with streaming JSON input/output. 
If I'm overlooking something please enlighten me.

In a nutshell: The library implicitly guarantees that marshaling will never 
write an incomplete JSON object due to an error, and that during 
unmarshaling, it will never pass an incomplete JSON message to 
`UnmarshalJSON`.

Work toward this was done about 3 years ago, on this CL: 
https://go-review.googlesource.com/c/go/+/13818/

Workt was eventually abandoned, apparently when the author was unsure how 
to make the new behavior opt-in. I believe this proposal will solve that 
issue.

*The problem to be solved*

Dealing with large JSON structures is inefficient, due to the internal 
buffering done by `encoding/json`. `json.NewEncoder` and `json.NewDecoder` 
appear to offer streaming benefits, but this is mostly an idiomatic 
advantage, not a performance one, as internal buffering still takes place. 
Particular aspects of the broader problem are already addressed on the 
above mentioned links, and benefits of streaming should be easy to imagine, 
so I won't bore people with details unless someone asks. 

*A naïve solution*

I believe a simple solution (simple from the perspective of a consumer of 
the library--the internal changes are not so simple) would be to add two 
interfaces:

    type StreamMarshaler interface {

        MarshalJSONStream(io.Writer) error

    }

    type StreamUnmarshaler interface {

        UnmarshalJSONStream(io.Reader) error

    }

During (un)marshaling, where `encoding/json` looks for `json.Marshaler` and 
`json.Unmarshaler` respectively, it will now look for (and possibly prefer) 
the new interfaces instead. Wrapping either the old or new interfaces to 
work as the other is a trivial matter.

With this change, and the requisite internal changes, it would be possible 
to begin streaming large JSON data to a server immediately, from within a 
`MarshalJSONStream()` implementation, for instance.

The drawback is that it violates the above mentioned promise of complete 
reads and writes, even with errors.

*Opt-in*

To accommodate this requirement, I believe it would be possible to expose 
the streaming functionality _only_ with the `json.Encoder` and 
`json.Decoder` implementations, and only when `EnablePartial*` (name TBD) 
is enabled.  So further, the following two functions would be added to the 
public API:

    func (*Encoder) EnablePartialWrites(on bool)

    func (*Decoder) EnablePartialReads(on bool)

The default behavior, even when a type implements one of the new `Stream*` 
interfaces, will be to operate on an entire JSON object at once. That is to 
say, the Encoder will internally buffer `MarshalJSONStream`'s output, and 
process any error before continuing, and a decoder will read an entire JSON 
object into a buffer, then pass it to `UnmarshalJSONStream` only if there 
are no errors.

However, when `EnablePartial*` is enabled, the library will bypass this 
internal buffering, allowing for immediate streaming to/from the 
source/destination.

Enabling streaming with the `EnablePartial*` toggle could be enough to 
already experience a benefit for many users, even without the use of the 
additional interfaces above.

Toggling `EnablePartial*` on will, of course, enable streaming for all 
types, not just those which implement the new interface above, so this 
could be considered a separate part of the proposal. In my opinion, this 
alone would be worth implementing, even if the new interface types above 
are done later or never.

*Internals*

CL 13818 can serve as very informative for this part of the discussion. 
I've also done some digging in the `encoding/json` package (as of 1.12) 
recently, for more current context. A large number of internal changes will 
be necessary to allow for this. I started playing around with a few 
internals, and I believe this is doable, but will mean a lot of code churn, 
so will need to be done carefully, in small steps with good code review.

As an exercise, I have successfully rewritten`indent()` to work with 
streams, rather than on byte slices, and began doing the same with 
`compact()`. The `encodeState` type would need to work with a standard 
`io.Writer` rather than specifically a `bytes.Buffer`. This seems to be a 
bigger change, but not technically difficult. I know there are other 
changes needed--I haven't done a complete audit of the code.

An open question is how these changes might impact performance. My 
benchmarks after changing `indent()` showed no change in performance, but 
it wasn't a particularly rigorous test.

With the internals rewritten to support streams, then it's just a matter of 
doing the internal buffering at the appropriate place, such as at API 
boundaries (i.e. in `Marshal()` and `Unmarshal()`), rather than as a 
bulit-in fundamental concept.  Then, as described above, turning off that 
buffering when properly configured above.

*Final comments*

To be clear, I am interested in working on this. I’m not just trying to 
throw out a “nice to have, now would somebody do this for me?” type of 
proposal. But I want to make sure I fully understand the history and 
context of this situation before I start too far down this rabbit hole.

I'm curious to hear the opinions of others who have been around longer. 
Perhaps such a proposal was already discussed (and possibly rejected?) in 
greater length than I can find in the above linked tickets. If so, please 
point me to the relevant conversation(s).

I am aware of several third-party libraries that offer some support like 
this, but most have various drawbacks (relying on code generation, or 
over-complex APIs). I would love to see this kind of support in the 
standard library. And one last aside: CL 13818 also added support for 
marshaling channels. That may or may not be a good idea (my personal 
feeling: probably not), but that can be addressed separately.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/9b081989-0ffa-40a1-8d6b-f56833e7b749%40googlegroups.com.

Reply via email to