> This works, but the downside is that each {...} of bytes has to be pulled 
into memory.  And the functions that is called is already designed to 
receive an io.Reader and parse the VERY large inner blob in an efficient 
manner.

Is the inner blob decoder actually using a json.Decoder, as shown in your 
example func secondDecoder()?  In that case, the simplest and most 
efficient answer is to create a persistent json.Decoder which wraps the 
underlying io.Reader directly, and just keep calling w2.Decode(&v) on each 
call.  It will happily consume the stream, one object at a time.

If that's not possible for some reason, then it sounds like you want to 
break the outer stream at outer object boundaries, i.e. { ... }, without 
fully parsing it.  You can do that with json.RawMessage:
https://play.golang.org/p/BitE6l27160

However, you've still read each object as a stream of bytes into memory, 
and you've still done some of the work of parsing the JSON to find the 
start and end of each object.  You can turn it back into an io.Reader by 
creating a bytes.NewBuffer around it, if that's what the inner parser 
requires.   However if each object is large, and you really need to avoid 
reading it into memory at all, then you'd need some sort of rewindable 
stream.

Another approach is to stop the source generating pretty-printed JSON, and 
make it generate in JSON-Lines <https://jsonlines.org/> format instead.  It 
sounds like you're unable to change the source, but you might be able to 
un-prettyprint the JSON by using an external tool (perhaps jq can do 
this).  Then I am thinking you could make a custom io.Reader which returns 
data up to a newline, then sends EOF and sends you a fresh io.Reader for 
the next line.

But this is all very complicated, when keeping the inner Decoder around 
from object to object is a simple solution to the problem that you 
described.  Is there some other constraint which prevents you from doing 
this?

On Saturday, 27 March 2021 at 19:42:40 UTC greg.sa...@gmail.com wrote:

> Good afternoon,
>
> For a case where there's a file containing a sequence of hashes (it could 
> be arrays too, as the underlying object type seems irrelevant) as per 
> RFC-7464.  I cannot figure out how to handle this in a memory efficient way 
> that doesn't involve pulling each blob 
>
> I've tried to express this on Go playground here: 
> https://play.golang.org/p/Aqx0gnc39rn
> Note that I'm using exponent-io/jsonpath as the JSON decoder, but 
> certainly that could be swapped for something else.
>
> In essence here is an example of the input bytes:
>
> {
>    "elements" : [
>       {
>          "Space" : "YCbCr",
>          "Point" : {
>             "Cb" : 0,
>             "Y" : 255,
>             "Cr" : -10
>          }
>       },
>       {
>          "Point" : {
>             "B" : 255,
>             "R" : 98,
>             "G" : 218
>          },
>          "Space" : "RGB"
>       }
>    ]
> }
> {
>    "elements" : [
>       {
>          "Space" : "YCbCr",
>          "Point" : {
>             "Cb" : 3000,
>             "Y" : 355,
>             "Cr" : -310
>          }
>       },
>       {
>          "Space" : "RGB",
>          "Point" : {
>             "B" : 355,
>             "G" : 318,
>             "R" : 108
>          }
>       }
>    ]
> }
> {
>    "elements" : [
>       {
>          "Space" : "YCbCr",
>          "Point" : {
>             "Cr" : -410,
>             "Cb" : 400,
>             "Y" : 455
>          }
>       },
>       {
>          "Space" : "RGB",
>          "Point" : {
>             "B" : 455,
>             "R" : 118,
>             "G" : 418
>          }
>       }
>    ]
> }
>
> I can iterate through that with this code:
>
> w := json.NewDecoder(bytes.NewReader(j))
> for w.More() {
> var v interface{}
> w.Decode(&v)
> fmt.Printf("%+v\n", v)
> }
>
> This works, but the downside is that each {...} of bytes has to be pulled 
> into memory.  And the functions that is called is already designed to 
> receive an io.Reader and parse the VERY large inner blob in an efficient 
> manner.
>
> So in principal, this is kinda want I want to do, but maybe I'm looking at 
> it all wrong:
>
>
> w := json.NewDecoder(bytes.NewReader(j))
> for w.More() {
> reader2 := ???? //Some io.Reader that represents each of the 3 json-seq 
> blocks
> secondDecoder(reader2)
> }
>
> func secondDecoder(reader io.Reader) {
> w2 := json.NewDecoder(reader)
> var v interface{}
> w2.Decode(&v)
> fmt.Printf("%+v\n", v)
> }
>
> Any ideas on how to solve this problem?
>
> I should note that it is not possible for the input to change in this case 
> as the system that consumes it is not the same one that has been generating 
> it for the past 5 years.
>
> Thanks!
>
> - Greg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/a2540f79-ad5c-40b8-a1bc-295bc27c9e5dn%40googlegroups.com.

Reply via email to