hi,

I added some print statements to illustrate the flow of parsing the
stream in the example you gave

$ python test.py
File is at offset: 0
Message length: 140
About to read body, file at offset: 144
Read message body, file at offset: 144
Opening a Message flatbuffer with size 140
File is at offset: 144
Message length: 140
About to read body, file at offset: 288
Read message body, file at offset: 320
Opening a Message flatbuffer with size 140
File is at offset: 320

So it seems the Flatbuffers library recognizes bytes 4 through 144 as a Message

I put my branch here:
https://github.com/wesm/arrow/tree/ipc-debug-print-20190318

The test.py is here
https://gist.github.com/wesm/dd40aa3196cd138e883d94c574d154f9

BTW can you comment on
https://github.com/ExpandingMan/Arrow.jl/issues/28? I would like to
see a Julia implementation inside the Apache Arrow project.

Thanks


Wes

On Mon, Mar 18, 2019 at 7:58 PM Expanding Man
<expanding...@protonmail.com.invalid> wrote:
>
> Hello all, I am working on a pure Julia implementation of the arrow standard. 
>  Currently I am working on ingesting the metadata, and it seems to me that 
> the output I'm creating with `pyarrow` is not matching the format, so I'm 
> trying to figure out where I've misunderstood it.
>
> I've written some arrow data to disk with the code you can find in [this 
> gist](https://gist.github.com/ExpandingMan/4ef3cadab6f3e6d65e672a32b821654f).
>
> Reading the format, I expect each message to start with an `Int32` giving the 
> size of the metadata flatbuffers, followed by the metadata flatbuffers 
> themselves.  The `Int32`'s indeed seem to be there, however the `Message` 
> flatbuffers do not start where I expect.  On the output from above, I find 
> the first flatbuffers containing the `Message` with the `Schema` at byte 20.  
> I am successfully able to construct all flatbuffer objects in Julia from byte 
> 20, but I was expecting to find this flatbuffer at byte 4 immediately 
> following the `Int32`.  What is contained in bytes 4 to 19?
>
> Similarly, I can find the next `Int32` at byte 144 as expected, however I 
> can't find the flatbuffers after that until byte 168.  Again, I can 
> successfully construct the metadata flatbuffers (in this case a `Message` 
> containing a `RecordBatch`) in Julia, but I was expecting to do this from 
> byte 148, not byte 168.  What is contained in bytes 144 to 168?  Note that 
> this is now a 24 byte boundary, where as for the first `Message` it was only 
> 16.
>
> What am I missing here?  I have a suspicion that there is a small flatbuffer 
> of some sort being contained in the mysterious extra bytes, but the format 
> description makes no mention of that.
>
> Thanks!

Reply via email to