RE: Reading data from Iceberg table into Apache Arrow in Java

2021-02-12 Thread Mayur Srivastava
Thank you Ryan. I’ll dig into the file scan plan and Spark codebase to learn about the internals of Iceberg vectorized read path. Then, I’ll try to implement the vectorized reader using core components only. I’ll be happy to work with you to contribute it back to the upstream. I’ll get back to

Re: Sorting requirements for partition keys

2021-02-12 Thread kkishore iiith
Apologies, https://iceberg.apache.org/spark-writes/#writing-to-partitioned-tables has answered my question On Fri, Feb 12, 2021 at 2:09 PM kkishore iiith wrote: > Hello Community, > > > https://developer.ibm.com/technologies/artificial-intelligence/articles/the-why-and-how-of-partitioning-in-apa

Sorting requirements for partition keys

2021-02-12 Thread kkishore iiith
Hello Community, https://developer.ibm.com/technologies/artificial-intelligence/articles/the-why-and-how-of-partitioning-in-apache-iceberg/ talks about sorting partition data, is that a requirement or only needed for performance improvement? Thanks, Kishor.

Re: Reading data from Iceberg table into Apache Arrow in Java

2021-02-12 Thread Ryan Blue
Hi Mayur, We built the Arrow support with Spark as the first use case, so the best examples of how to use it are in Spark. The generic reader does two things: it plans a scan and sets up an iterator of file readers to produce generic records. What you want to do is the same thing, but set up the