hi Randy, In Julia I think this is complicated by the lack of a Flatbuffers compiler for the language. In the case of Feather files, in Feather.jl they have implemented the Flatbuffers schema in Julia code:
https://github.com/JuliaData/Feather.jl/blob/master/src/metadata.jl#L3 So you need to do one of: a) make a Julia compiler for Flatbuffers files b) Write a native implementation of the Arrow schemas by hand or c) Wrap a C or C++ version of the compiled Flatbuffers schema Here is some C++ code where we read a generic Message https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/message.cc#L139 Here's where we read the message protocol from a generic InputStream (and then call Message::ReadFrom): https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/message.cc#L236 In the case of a Schema, the body length will be 0. - Wes On Thu, Jul 12, 2018 at 3:58 PM, Paul Taylor <ptaylor.apa...@gmail.com> wrote: > Hi Randy, > > The first four bytes are the int32 length of the flatbuffers Message > metadata > <https://github.com/apache/arrow/blob/e14705745bb8d625b3c7dda2857e93cdfe848178/format/Message.fbs#L93> > plus 4 bytes of padding between the length and the Message metadata itself. > The Message metadata starts on the 8th byte. > > So to read an entire Message, read and store the first four bytes (the > metadata length). Then advance past the 4 padding bytes, and use the > flatbuffers API to read the Message table. > > The Message table has a bodyLength field, which is byte length of all the > buffers (data, validity, offsets, and typeIds) for all the Arrays in the > Message (since Schema messages don't contain any data, its bodyLength is > always 0). > > Once you've read the Message table via flabuffers, advance `metadata length` > number of bytes to position yourself to read the Array buffers. > > After reading the buffers, advance another `bodyLength` number of bytes to > read the next message. Repeat this process to read all Messages from an > Arrow stream. > > If you're familiar with JavaScript/TypeScript, you can reference the > implementation here > <https://github.com/apache/arrow/blob/e14705745bb8d625b3c7dda2857e93cdfe848178/js/src/ipc/reader/binary.ts#L145>. > > Hope this clears things up, > > Paul > > > > On 07/12/2018 11:30 AM, Randy Zwitch wrote: >> >> I’m trying to understand how to parse a Buffer into a Schema, but using >> using pdb with Python and reading the TS/Python/C++ Arrow source hasn’t >> really cleared much up for me. Nor has studying >> https://arrow.apache.org/docs/ipc.html >> >> >> Here’s are the steps of what I’ve tried (the code is Julia, but only >> because I’m trying to do this natively, rather than wrap the Arrow C >> code): >> >> >> # Thrift API method returning a struct (sm_buf, sm_size, df_buf, df_size) >> (works as expected) >> julia> tdf = sql_execute_df(conn, "select * from flights_2008_7m limit >> 1000", 0, 0, 1000) >> >> MapD.TDataFrame(UInt8[0xba, 0x58, 0x1b, 0x3d], 93856, UInt8[0xab, 0xd7, >> 0x7e, 0x50], 188880) >> >> # Wrap shared memory into julia array, based on handle and size (works as >> expected) >> julia> sm_buf = MapD.load_buffer(tdf.sm_handle, tdf.sm_size) #wrapper >> using >> shmget/shmat >> 93856-element Array{UInt8,1}: >> 0x2c >> 0x16 >> 0x00 >> 0x00 >> 0x14 >> 0x00 >> 0x00 >> 0x00 >> 0x00 >> 0x00 >> ⋮ >> 0x20 >> 0x74 >> 0x6f >> 0x20 >> 0x4d >> 0x66 >> 0x72 >> 0x00 >> 0x00 >> >> At this point, walking through an similar Python process, I know that >> sm_buf represents >> - type: Schema >> - metadata length: 5676 >> - body_length: 0 >> >> Where I’m confused is how to proceed. >> >> I am getting metadata_length by reinterpreting the first 4-bytes as Int32. >> >> julia> mlen = reinterpret(Int32, sm_buf[1:4])[1] >> 5676 >> >> I then assumed that I could start at byte 5 and take the next `mlen-1` >> bytes: >> >> julia> metadata = sm_buf[5:5+mlen-1] >> 5676-element Array{UInt8,1}: >> 0x14 >> 0x00 >> 0x00 >> 0x00 >> 0x00 >> 0x00 >> 0x00 >> 0x00 >> 0x0c >> 0x00 >> ⋮ >> 0x79 >> 0x65 >> 0x61 >> 0x72 >> 0x00 >> 0x00 >> 0x00 >> 0x00 >> 0x00 >> >> >> Am I on the right track here? I *think* that my `metadata` variable above >> is a FlatBuffer, but how do I know what its structure is? Additionally, >> what am I supposed to do with all of the bytes that haven’t been read from >> `sm_buf` yet? `sm_buf` is 93856 bytes and I’ve only read the first 4 bytes >> + metadata length, leaving some 88,000 bytes not processed yet. >> >> Any help would be greatly appreciated here. Please note that I’m not >> asking >> for julia coding help, but rather what the Arrow bytes actually mean/their >> structure and how to process them further. >> >> Thanks, >> Randy Zwitch >> >