Re: row counts in footer of IPC file format

2023-04-01 Thread Micah Kornfield
IIRC Struct's are immutable once defined, if you want to evolve, then Tables are necessary. On Mon, Mar 20, 2023 at 8:22 AM Weston Pace wrote: > +1, I'm generally in favor of the idea. I would prefer > `recordBatchNumRows` (or, less favorably, `recordBatchSize`). I don't > think `recordBatchLe

Re: [External] Re: row counts in footer of IPC file format

2023-03-31 Thread David Dali Susanibar Arce
on the right lines? > > Kind regards, > > Martin Traverse > Technical Architect > UKI Risk > Tel: +44 7305 120 791 > Email: martin.trave...@accenture.com > > My regular office hours are 10:00 - 18:30 UK time, Monday - Thursday > > > > > > > > > >

RE: [External] Re: row counts in footer of IPC file format

2023-03-28 Thread Traverse, Martin
- From: Weston Pace Sent: 28 March 2023 17:35 To: dev@arrow.apache.org Subject: [External] Re: row counts in footer of IPC file format This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments. I suspect the next step will be to create two implementat

Re: row counts in footer of IPC file format

2023-03-28 Thread Weston Pace
I suspect the next step will be to create two implementations and create test files for the integration test suite. These will be required before we can vote on this. Are either of you interested in contributing an implementation (C++, Rust, Java, and Go have been the usual suspects in the past b

RE: row counts in footer of IPC file format

2023-03-28 Thread Martin Traverse
+1 from me, we would definitely like this feature. Anything like recordBatchNumRows, recordBatchRowCounts etc. seems clear naming-wise that it is talking about rows not bytes. The RecordBatchStatistics idea would also be fine for us although we don’t have immediate need for other statistics. O

Re: row counts in footer of IPC file format

2023-03-20 Thread Weston Pace
+1, I'm generally in favor of the idea. I would prefer `recordBatchNumRows` (or, less favorably, `recordBatchSize`). I don't think `recordBatchLengths` works because there are already places in the footer where "length" is interpreted as "number of bytes". I'm not an expert on flatbuffers evolut

row counts in footer of IPC file format

2023-03-18 Thread Steve Kim
Hello everyone, I would like to be able to quickly seek to an arbitrary row in an Arrow file. With the current file format, reading the file footer alone is not enough to determine the record batch that contains a given row index. The row counts of the record batches are only found in the metadat