storing per record batch metadata in arrow IPC file

2022-04-05 Thread Yue Ni
Hi there, I am investigating analyzing time series data using apache arrow. I would like to store some record batch specific metadata, for example, some statistics/tags about data in a particular record batch. More specifically, I may use a single record batch to store metric samples for a certain

Re: [Question] Is it possible to write to IPC without an intermediary buffer?

2022-04-05 Thread Jorge Cardoso Leitão
Hi Micah, Thank you for your reply. That is also my understanding - not possible in streaming IPC, possible in file IPC with random access. The pseudo-code could be something like: start = writer.seek_current(); empty_locations = create_empty_header(schema) write_header(writer, empty_locations) l

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Aldrin
Hm, I didn't think it was possible, but it looks like there may be some things you can try? My understanding was that you create a writer for an IPC stream or file and you pass a schema on construction which is used as "the schema" for the IPC stream/file. So, RecordBatches written using that writ

Re: [JAVA] JDK Support Policy?

2022-04-05 Thread Bryan Cutler
Thanks for bringing this up Micah. Given that we have finite resources for CI, I think the oldest active LTS version sounds pretty reasonable. Ultimately it should be community driven and balance between the available resources we have and peoples time to patch any issues that come up. On Tue, Mar

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Yue Ni
Hi Aldrin, Thanks for the pointers. I checked out the C++ source code of this part, and I think currently record batch specific metadata is not written into the IPC file probably due to a bug in the code. I logged a bug to track this issue (https://issues.apache.org/jira/browse/ARROW-16131), thank

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Weston Pace
Correct, the "ground truth" so to speak for these things is probably the flatbuffers files[1] (Message.fbs, Schema.fbs, and Schema.fbs in this case). There is a per-message custom metadata field that could be used as you describe. The C++ implementation does not expose this today that I can tell.

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Weston Pace
Actually, if you are doing streaming processing, you would have to store it with the record batch since there is no footer :) On Tue, Apr 5, 2022 at 8:40 PM Weston Pace wrote: > > Correct, the "ground truth" so to speak for these things is probably > the flatbuffers files[1] (Message.fbs, Schema.