I’m trying to understand how to parse a Buffer into a Schema, but using
using pdb with Python and reading the TS/Python/C++ Arrow source hasn’t
really cleared much up for me. Nor has studying
https://arrow.apache.org/docs/ipc.html
Here’s are the steps of what I’ve tried (the code is Julia, but only
because I’m trying to do this natively, rather than wrap the Arrow C code):
# Thrift API method returning a struct (sm_buf, sm_size, df_buf, df_size)
(works as expected)
julia> tdf = sql_execute_df(conn, "select * from flights_2008_7m limit
1000", 0, 0, 1000)
MapD.TDataFrame(UInt8[0xba, 0x58, 0x1b, 0x3d], 93856, UInt8[0xab, 0xd7,
0x7e, 0x50], 188880)
# Wrap shared memory into julia array, based on handle and size (works as
expected)
julia> sm_buf = MapD.load_buffer(tdf.sm_handle, tdf.sm_size) #wrapper using
shmget/shmat
93856-element Array{UInt8,1}:
0x2c
0x16
0x00
0x00
0x14
0x00
0x00
0x00
0x00
0x00
⋮
0x20
0x74
0x6f
0x20
0x4d
0x66
0x72
0x00
0x00
At this point, walking through an similar Python process, I know that
sm_buf represents
- type: Schema
- metadata length: 5676
- body_length: 0
Where I’m confused is how to proceed.
I am getting metadata_length by reinterpreting the first 4-bytes as Int32.
julia> mlen = reinterpret(Int32, sm_buf[1:4])[1]
5676
I then assumed that I could start at byte 5 and take the next `mlen-1`
bytes:
julia> metadata = sm_buf[5:5+mlen-1]
5676-element Array{UInt8,1}:
0x14
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x0c
0x00
⋮
0x79
0x65
0x61
0x72
0x00
0x00
0x00
0x00
0x00
Am I on the right track here? I *think* that my `metadata` variable above
is a FlatBuffer, but how do I know what its structure is? Additionally,
what am I supposed to do with all of the bytes that haven’t been read from
`sm_buf` yet? `sm_buf` is 93856 bytes and I’ve only read the first 4 bytes
+ metadata length, leaving some 88,000 bytes not processed yet.
Any help would be greatly appreciated here. Please note that I’m not asking
for julia coding help, but rather what the Arrow bytes actually mean/their
structure and how to process them further.
Thanks,
Randy Zwitch