On Fri, Aug 9, 2019 at 8:33 AM Jonathan Hall <fli...@flimzy.com> wrote: > > I debated posting here, or straight to GitHub. If that's the better place, I > can move the thread there. I have long wanted proper streaming support in the > `encoding/json` library. Lately I’ve been doing some digging to understand > the current state of things, and I think I’ve come to grips with most of it. > > > A number of previous issues relate to this topic: > https://github.com/golang/go/issues/7872, > https://github.com/golang/go/issues/11046, > https://github.com/golang/go/issues/12001, > https://github.com/golang/go/issues/14140 > > > I have read through each of these issues, and believe I have a fair > understanding of the problems associated with streaming JSON input/output. If > I'm overlooking something please enlighten me. > > > In a nutshell: The library implicitly guarantees that marshaling will never > write an incomplete JSON object due to an error, and that during > unmarshaling, it will never pass an incomplete JSON message to > `UnmarshalJSON`. > > > Work toward this was done about 3 years ago, on this CL: > https://go-review.googlesource.com/c/go/+/13818/ > > Workt was eventually abandoned, apparently when the author was unsure how to > make the new behavior opt-in. I believe this proposal will solve that issue. > > > The problem to be solved > > > Dealing with large JSON structures is inefficient, due to the internal > buffering done by `encoding/json`. `json.NewEncoder` and `json.NewDecoder` > appear to offer streaming benefits, but this is mostly an idiomatic > advantage, not a performance one, as internal buffering still takes place. > Particular aspects of the broader problem are already addressed on the above > mentioned links, and benefits of streaming should be easy to imagine, so I > won't bore people with details unless someone asks. > > A naïve solution > > > I believe a simple solution (simple from the perspective of a consumer of the > library--the internal changes are not so simple) would be to add two > interfaces: > > > type StreamMarshaler interface { > > MarshalJSONStream(io.Writer) error > > } > > > type StreamUnmarshaler interface { > > UnmarshalJSONStream(io.Reader) error > > } > > > During (un)marshaling, where `encoding/json` looks for `json.Marshaler` and > `json.Unmarshaler` respectively, it will now look for (and possibly prefer) > the new interfaces instead. Wrapping either the old or new interfaces to work > as the other is a trivial matter. > > > With this change, and the requisite internal changes, it would be possible to > begin streaming large JSON data to a server immediately, from within a > `MarshalJSONStream()` implementation, for instance. > > > The drawback is that it violates the above mentioned promise of complete > reads and writes, even with errors. > > > Opt-in > > > To accommodate this requirement, I believe it would be possible to expose the > streaming functionality _only_ with the `json.Encoder` and `json.Decoder` > implementations, and only when `EnablePartial*` (name TBD) is enabled. So > further, the following two functions would be added to the public API: > > > func (*Encoder) EnablePartialWrites(on bool) > > > func (*Decoder) EnablePartialReads(on bool) > > > The default behavior, even when a type implements one of the new `Stream*` > interfaces, will be to operate on an entire JSON object at once. That is to > say, the Encoder will internally buffer `MarshalJSONStream`'s output, and > process any error before continuing, and a decoder will read an entire JSON > object into a buffer, then pass it to `UnmarshalJSONStream` only if there are > no errors. > > > However, when `EnablePartial*` is enabled, the library will bypass this > internal buffering, allowing for immediate streaming to/from the > source/destination. > > > Enabling streaming with the `EnablePartial*` toggle could be enough to > already experience a benefit for many users, even without the use of the > additional interfaces above. > > > Toggling `EnablePartial*` on will, of course, enable streaming for all types, > not just those which implement the new interface above, so this could be > considered a separate part of the proposal. In my opinion, this alone would > be worth implementing, even if the new interface types above are done later > or never. > > > Internals > > > CL 13818 can serve as very informative for this part of the discussion. I've > also done some digging in the `encoding/json` package (as of 1.12) recently, > for more current context. A large number of internal changes will be > necessary to allow for this. I started playing around with a few internals, > and I believe this is doable, but will mean a lot of code churn, so will need > to be done carefully, in small steps with good code review. > > > As an exercise, I have successfully rewritten`indent()` to work with streams, > rather than on byte slices, and began doing the same with `compact()`. The > `encodeState` type would need to work with a standard `io.Writer` rather than > specifically a `bytes.Buffer`. This seems to be a bigger change, but not > technically difficult. I know there are other changes needed--I haven't done > a complete audit of the code. > > > An open question is how these changes might impact performance. My benchmarks > after changing `indent()` showed no change in performance, but it wasn't a > particularly rigorous test. > > > With the internals rewritten to support streams, then it's just a matter of > doing the internal buffering at the appropriate place, such as at API > boundaries (i.e. in `Marshal()` and `Unmarshal()`), rather than as a bulit-in > fundamental concept. Then, as described above, turning off that buffering > when properly configured above. > > > Final comments > > > To be clear, I am interested in working on this. I’m not just trying to throw > out a “nice to have, now would somebody do this for me?” type of proposal. > But I want to make sure I fully understand the history and context of this > situation before I start too far down this rabbit hole. > > > I'm curious to hear the opinions of others who have been around longer. > Perhaps such a proposal was already discussed (and possibly rejected?) in > greater length than I can find in the above linked tickets. If so, please > point me to the relevant conversation(s). > > > I am aware of several third-party libraries that offer some support like > this, but most have various drawbacks (relying on code generation, or > over-complex APIs). I would love to see this kind of support in the standard > library. And one last aside: CL 13818 also added support for marshaling > channels. That may or may not be a good idea (my personal feeling: probably > not), but that can be addressed separately.
Instead of modifying the existing Encoder/Decoder, wouldn't it be better to do this as a separate encoder/decoder? > > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/9b081989-0ffa-40a1-8d6b-f56833e7b749%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAMV2RqpgkWYQkTFy43LwRO-Xz10N9m_Ye4JB1XB9-%2BioMbFugA%40mail.gmail.com.