hi, I added some print statements to illustrate the flow of parsing the stream in the example you gave
$ python test.py File is at offset: 0 Message length: 140 About to read body, file at offset: 144 Read message body, file at offset: 144 Opening a Message flatbuffer with size 140 File is at offset: 144 Message length: 140 About to read body, file at offset: 288 Read message body, file at offset: 320 Opening a Message flatbuffer with size 140 File is at offset: 320 So it seems the Flatbuffers library recognizes bytes 4 through 144 as a Message I put my branch here: https://github.com/wesm/arrow/tree/ipc-debug-print-20190318 The test.py is here https://gist.github.com/wesm/dd40aa3196cd138e883d94c574d154f9 BTW can you comment on https://github.com/ExpandingMan/Arrow.jl/issues/28? I would like to see a Julia implementation inside the Apache Arrow project. Thanks Wes On Mon, Mar 18, 2019 at 7:58 PM Expanding Man <expanding...@protonmail.com.invalid> wrote: > > Hello all, I am working on a pure Julia implementation of the arrow standard. > Currently I am working on ingesting the metadata, and it seems to me that > the output I'm creating with `pyarrow` is not matching the format, so I'm > trying to figure out where I've misunderstood it. > > I've written some arrow data to disk with the code you can find in [this > gist](https://gist.github.com/ExpandingMan/4ef3cadab6f3e6d65e672a32b821654f). > > Reading the format, I expect each message to start with an `Int32` giving the > size of the metadata flatbuffers, followed by the metadata flatbuffers > themselves. The `Int32`'s indeed seem to be there, however the `Message` > flatbuffers do not start where I expect. On the output from above, I find > the first flatbuffers containing the `Message` with the `Schema` at byte 20. > I am successfully able to construct all flatbuffer objects in Julia from byte > 20, but I was expecting to find this flatbuffer at byte 4 immediately > following the `Int32`. What is contained in bytes 4 to 19? > > Similarly, I can find the next `Int32` at byte 144 as expected, however I > can't find the flatbuffers after that until byte 168. Again, I can > successfully construct the metadata flatbuffers (in this case a `Message` > containing a `RecordBatch`) in Julia, but I was expecting to do this from > byte 148, not byte 168. What is contained in bytes 144 to 168? Note that > this is now a 24 byte boundary, where as for the first `Message` it was only > 16. > > What am I missing here? I have a suspicion that there is a small flatbuffer > of some sort being contained in the mysterious extra bytes, but the format > description makes no mention of that. > > Thanks!