Thank you Ryan.
I’ll dig into the file scan plan and Spark codebase to learn about the
internals of Iceberg vectorized read path. Then, I’ll try to implement the
vectorized reader using core components only. I’ll be happy to work with you to
contribute it back to the upstream. I’ll get back to
Apologies,
https://iceberg.apache.org/spark-writes/#writing-to-partitioned-tables has
answered my question
On Fri, Feb 12, 2021 at 2:09 PM kkishore iiith
wrote:
> Hello Community,
>
>
> https://developer.ibm.com/technologies/artificial-intelligence/articles/the-why-and-how-of-partitioning-in-apa
Hello Community,
https://developer.ibm.com/technologies/artificial-intelligence/articles/the-why-and-how-of-partitioning-in-apache-iceberg/
talks about sorting partition data, is that a requirement or only needed
for performance improvement?
Thanks,
Kishor.
Hi Mayur,
We built the Arrow support with Spark as the first use case, so the best
examples of how to use it are in Spark.
The generic reader does two things: it plans a scan and sets up an iterator
of file readers to produce generic records. What you want to do is the same
thing, but set up the