subject:"Bucketed Joins on Iceberg"

Re: Bucketed Joins on Iceberg

2021-01-28 Thread Romin Parekh

Thanks Ryan. I was unable to attend the community sync but a few of my colleagues did. We are discussing next steps internally and are also open to contributing. Thanks, Romin On Thu, Jan 28, 2021 at 2:20 PM Ryan Blue wrote: > Hi Romin, > > Spark has poor support for bucketed joins and we have

Re: Bucketed Joins on Iceberg

2021-01-28 Thread Ryan Blue

Hi Romin, Spark has poor support for bucketed joins and we have a design doc to hopefully improve that. We talked about this yesterday at the community sync. One of the parts that we also need to get into Spark

Bucketed Joins on Iceberg

2021-01-26 Thread Romin Parekh

Hi Iceberg Devs, I am evaluating the performance of bucketed joins across two bucketed datasets to find an optimal bucketing strategy. I was able to ingest into a bucketed table [1] and using the TableScan API, I am able to see a subset (total files / bucket size) of files being scanned [2]. I also