Re: Spark Merge On Read Support

2021-11-18 Thread Yufei Gu
Hi Puneet, Agreed with Ryan, you can use Spark 2.4 to read Iceberg tables with delete files. To add to this, we are recently adding vectorized read support in Spark 3.2, which is 1.6 ~ 2 times faster than non-vectorized read(the existing solution in Spark 2.4). 1. Position delete support https://g

Re: Spark Merge On Read Support

2021-11-18 Thread Puneet Zaroo
Thanks Ryan, This is super helpful to know. Yes, the discussion about 'plans' in Spark 3.2 made me think it could be for read support. For the Presto read support, could you (or Jack) please point to the PRs that are work-in-progress. Thanks, - Puneet On Thu, Nov 18, 2021 at 8:26 AM Ryan Blue wro

Re: Spark Merge On Read Support

2021-11-18 Thread Ryan Blue
Puneet, Good question. Reading v2 tables with delete files has been supported for several versions, since before we adopted the v2 additions to the spec. You should be fine when using Spark, Flink, Hive, etc. with runtime Jars from the Iceberg project. Trino has yet to add support, but Jack has a

Re: Spark Merge On Read Support

2021-11-17 Thread Puneet Zaroo
Perhaps a newbie question, but if the requirement is to just read v2 tables with equality and/or position delete files, does that also require Spark 3.2 or is that supported in Spark 2.4 as well (even if in a sub-optimal way). Thanks, - Puneet On Wed, Nov 17, 2021 at 10:07 AM Ryan Blue wrote:

Re: Spark Merge On Read Support

2021-11-17 Thread Ryan Blue
The plan is to support it in 3.2. I think that we're very close but Anton is the expert there. On Tue, Nov 16, 2021 at 6:22 AM Sreeram Garlapati wrote: > This makes sense, thanks a lot @Ryan Blue . > > Are all building blocks for MOR support (features like - delta-based > plans) fully available

Re: Spark Merge On Read Support

2021-11-16 Thread Sreeram Garlapati
This makes sense, thanks a lot @Ryan Blue . Are all building blocks for MOR support (features like - delta-based plans) fully available in Spark 3.2 - or is there any reason we would need Spark 3.3? Or is there more ongoing work needed to fully validate this? I am in need of this specific data poi

Re: Spark Merge On Read Support

2021-11-15 Thread Ryan Blue
Sreeram, The project tracking this is here: https://github.com/apache/iceberg/projects/11 It isn’t easy to get a good picture, since most of the PRs are merged. But Anton is working on the next set of PRs for Spark. Maybe Anton can find some time to add a few notes about what's left to be done.

Spark Merge On Read Support

2021-11-11 Thread Sreeram Garlapati
Hello Iceberg devs! After going through the mail threads (especially "Spark version support strategy") and relevant PRs - it looks like - *Merge on Read* Support (ie., Spark writers writing equality deletes) will be available with *Iceberg **+ Spark 3.2*. Is this understanding correct!? Or is this