On Fri, Aug 9, 2019 at 8:33 AM Jonathan Hall <fli...@flimzy.com> wrote:
>
> I debated posting here, or straight to GitHub. If that's the better place, I 
> can move the thread there. I have long wanted proper streaming support in the 
> `encoding/json` library. Lately I’ve been doing some digging to understand 
> the current state of things, and I think I’ve come to grips with most of it.
>
>
> A number of previous issues relate to this topic: 
> https://github.com/golang/go/issues/7872, 
> https://github.com/golang/go/issues/11046, 
> https://github.com/golang/go/issues/12001, 
> https://github.com/golang/go/issues/14140
>
>
> I have read through each of these issues, and believe I have a fair 
> understanding of the problems associated with streaming JSON input/output. If 
> I'm overlooking something please enlighten me.
>
>
> In a nutshell: The library implicitly guarantees that marshaling will never 
> write an incomplete JSON object due to an error, and that during 
> unmarshaling, it will never pass an incomplete JSON message to 
> `UnmarshalJSON`.
>
>
> Work toward this was done about 3 years ago, on this CL: 
> https://go-review.googlesource.com/c/go/+/13818/
>
> Workt was eventually abandoned, apparently when the author was unsure how to 
> make the new behavior opt-in. I believe this proposal will solve that issue.
>
>
> The problem to be solved
>
>
> Dealing with large JSON structures is inefficient, due to the internal 
> buffering done by `encoding/json`. `json.NewEncoder` and `json.NewDecoder` 
> appear to offer streaming benefits, but this is mostly an idiomatic 
> advantage, not a performance one, as internal buffering still takes place. 
> Particular aspects of the broader problem are already addressed on the above 
> mentioned links, and benefits of streaming should be easy to imagine, so I 
> won't bore people with details unless someone asks.
>
> A naïve solution
>
>
> I believe a simple solution (simple from the perspective of a consumer of the 
> library--the internal changes are not so simple) would be to add two 
> interfaces:
>
>
>     type StreamMarshaler interface {
>
>         MarshalJSONStream(io.Writer) error
>
>     }
>
>
>     type StreamUnmarshaler interface {
>
>         UnmarshalJSONStream(io.Reader) error
>
>     }
>
>
> During (un)marshaling, where `encoding/json` looks for `json.Marshaler` and 
> `json.Unmarshaler` respectively, it will now look for (and possibly prefer) 
> the new interfaces instead. Wrapping either the old or new interfaces to work 
> as the other is a trivial matter.
>
>
> With this change, and the requisite internal changes, it would be possible to 
> begin streaming large JSON data to a server immediately, from within a 
> `MarshalJSONStream()` implementation, for instance.
>
>
> The drawback is that it violates the above mentioned promise of complete 
> reads and writes, even with errors.
>
>
> Opt-in
>
>
> To accommodate this requirement, I believe it would be possible to expose the 
> streaming functionality _only_ with the `json.Encoder` and `json.Decoder` 
> implementations, and only when `EnablePartial*` (name TBD) is enabled.  So 
> further, the following two functions would be added to the public API:
>
>
>     func (*Encoder) EnablePartialWrites(on bool)
>
>
>     func (*Decoder) EnablePartialReads(on bool)
>
>
> The default behavior, even when a type implements one of the new `Stream*` 
> interfaces, will be to operate on an entire JSON object at once. That is to 
> say, the Encoder will internally buffer `MarshalJSONStream`'s output, and 
> process any error before continuing, and a decoder will read an entire JSON 
> object into a buffer, then pass it to `UnmarshalJSONStream` only if there are 
> no errors.
>
>
> However, when `EnablePartial*` is enabled, the library will bypass this 
> internal buffering, allowing for immediate streaming to/from the 
> source/destination.
>
>
> Enabling streaming with the `EnablePartial*` toggle could be enough to 
> already experience a benefit for many users, even without the use of the 
> additional interfaces above.
>
>
> Toggling `EnablePartial*` on will, of course, enable streaming for all types, 
> not just those which implement the new interface above, so this could be 
> considered a separate part of the proposal. In my opinion, this alone would 
> be worth implementing, even if the new interface types above are done later 
> or never.
>
>
> Internals
>
>
> CL 13818 can serve as very informative for this part of the discussion. I've 
> also done some digging in the `encoding/json` package (as of 1.12) recently, 
> for more current context. A large number of internal changes will be 
> necessary to allow for this. I started playing around with a few internals, 
> and I believe this is doable, but will mean a lot of code churn, so will need 
> to be done carefully, in small steps with good code review.
>
>
> As an exercise, I have successfully rewritten`indent()` to work with streams, 
> rather than on byte slices, and began doing the same with `compact()`. The 
> `encodeState` type would need to work with a standard `io.Writer` rather than 
> specifically a `bytes.Buffer`. This seems to be a bigger change, but not 
> technically difficult. I know there are other changes needed--I haven't done 
> a complete audit of the code.
>
>
> An open question is how these changes might impact performance. My benchmarks 
> after changing `indent()` showed no change in performance, but it wasn't a 
> particularly rigorous test.
>
>
> With the internals rewritten to support streams, then it's just a matter of 
> doing the internal buffering at the appropriate place, such as at API 
> boundaries (i.e. in `Marshal()` and `Unmarshal()`), rather than as a bulit-in 
> fundamental concept.  Then, as described above, turning off that buffering 
> when properly configured above.
>
>
> Final comments
>
>
> To be clear, I am interested in working on this. I’m not just trying to throw 
> out a “nice to have, now would somebody do this for me?” type of proposal. 
> But I want to make sure I fully understand the history and context of this 
> situation before I start too far down this rabbit hole.
>
>
> I'm curious to hear the opinions of others who have been around longer. 
> Perhaps such a proposal was already discussed (and possibly rejected?) in 
> greater length than I can find in the above linked tickets. If so, please 
> point me to the relevant conversation(s).
>
>
> I am aware of several third-party libraries that offer some support like 
> this, but most have various drawbacks (relying on code generation, or 
> over-complex APIs). I would love to see this kind of support in the standard 
> library. And one last aside: CL 13818 also added support for marshaling 
> channels. That may or may not be a good idea (my personal feeling: probably 
> not), but that can be addressed separately.

Instead of modifying the existing Encoder/Decoder, wouldn't it be
better to do this as a separate encoder/decoder?




>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/9b081989-0ffa-40a1-8d6b-f56833e7b749%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAMV2RqpgkWYQkTFy43LwRO-Xz10N9m_Ye4JB1XB9-%2BioMbFugA%40mail.gmail.com.

Reply via email to