Re: Feather v2 random access

2020-06-24 Thread Yue Ni
Hi François, Thanks so much for the very detailed explanation, and that makes sense to me. I will check out the links for more information. @Wes, ARROW-8250 is very useful to me as well and I will keep an eye on it. Thanks. On Wed, Jun 24, 2020 at 11:08 PM Wes McKinney wrote: > See also this J

Re: Feather v2 random access

2020-06-24 Thread Wes McKinney
See also this JIRA regarding adding random access read APIs for IPC files (and thus Feather) https://issues.apache.org/jira/browse/ARROW-8250 I hope to see this implemented someday. On Wed, Jun 24, 2020 at 10:03 AM Francois Saint-Jacques wrote: > > I forgot to mention that you can see how this

Re: Feather v2 random access

2020-06-24 Thread Francois Saint-Jacques
I forgot to mention that you can see how this is glued in `feather::reader::Read` [1]. This makes it obvious that nothing is cached and everything is loaded in memory. François [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/feather.cc#L715-L723 On Wed, Jun 24, 2020 at 10:53 A

Re: Feather v2 random access

2020-06-24 Thread Francois Saint-Jacques
Hello Yue, FeatherV2 is just a facade for the Arrow IPC file format. You can find the implementation here [1]. I will try to answer your question with inline comments. On a high level, the file format writes a schema and then multiple "chunks" called RecordBatch. Your lowest level of granularity