Re: storing per record batch metadata in arrow IPC file

2022-04-06 Thread Yue Ni
Hi Weston, > The C++ implementation does not expose this today that I can tell. So if you want to use this then some C++ changes will be needed. There is already a JIRA ticket for this at [2]. Thanks for pointing this out, it seems the ticket ARROW-16131 I logged above duplicates with ARROW-6940

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Weston Pace
Actually, if you are doing streaming processing, you would have to store it with the record batch since there is no footer :) On Tue, Apr 5, 2022 at 8:40 PM Weston Pace wrote: > > Correct, the "ground truth" so to speak for these things is probably > the flatbuffers files[1] (Message.fbs, Schema.

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Weston Pace
Correct, the "ground truth" so to speak for these things is probably the flatbuffers files[1] (Message.fbs, Schema.fbs, and Schema.fbs in this case). There is a per-message custom metadata field that could be used as you describe. The C++ implementation does not expose this today that I can tell.

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Yue Ni
Hi Aldrin, Thanks for the pointers. I checked out the C++ source code of this part, and I think currently record batch specific metadata is not written into the IPC file probably due to a bug in the code. I logged a bug to track this issue (https://issues.apache.org/jira/browse/ARROW-16131), thank

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Aldrin
Hm, I didn't think it was possible, but it looks like there may be some things you can try? My understanding was that you create a writer for an IPC stream or file and you pass a schema on construction which is used as "the schema" for the IPC stream/file. So, RecordBatches written using that writ