Hi Puneet,
Agreed with Ryan, you can use Spark 2.4 to read Iceberg tables with delete
files. To add to this, we are recently adding vectorized read support in
Spark 3.2, which is 1.6 ~ 2 times faster than non-vectorized read(the
existing solution in Spark 2.4).
1. Position delete support https://g
Thanks Ryan,
This is super helpful to know. Yes, the discussion about 'plans' in Spark
3.2 made me think it could be for read support.
For the Presto read support, could you (or Jack) please point to the PRs
that are work-in-progress.
Thanks,
- Puneet
On Thu, Nov 18, 2021 at 8:26 AM Ryan Blue wro
Puneet,
Good question. Reading v2 tables with delete files has been supported for
several versions, since before we adopted the v2 additions to the spec. You
should be fine when using Spark, Flink, Hive, etc. with runtime Jars from
the Iceberg project. Trino has yet to add support, but Jack has a
Perhaps a newbie question, but if the requirement is to just read v2 tables
with equality and/or position delete files, does that also require Spark
3.2 or is that supported in Spark 2.4 as well (even if in a sub-optimal
way).
Thanks,
- Puneet
On Wed, Nov 17, 2021 at 10:07 AM Ryan Blue wrote:
The plan is to support it in 3.2. I think that we're very close but Anton
is the expert there.
On Tue, Nov 16, 2021 at 6:22 AM Sreeram Garlapati
wrote:
> This makes sense, thanks a lot @Ryan Blue .
>
> Are all building blocks for MOR support (features like - delta-based
> plans) fully available
This makes sense, thanks a lot @Ryan Blue .
Are all building blocks for MOR support (features like - delta-based plans)
fully available in Spark 3.2 - or is there any reason we would need Spark
3.3? Or is there more ongoing work needed to fully validate this? I am in
need of this specific data poi
Sreeram,
The project tracking this is here:
https://github.com/apache/iceberg/projects/11
It isn’t easy to get a good picture, since most of the PRs are merged. But
Anton is working on the next set of PRs for Spark. Maybe Anton can find
some time to add a few notes about what's left to be done.
Hello Iceberg devs!
After going through the mail threads (especially "Spark version support
strategy") and relevant PRs - it looks like - *Merge on Read* Support (ie.,
Spark writers writing equality deletes) will be available with
*Iceberg **+ Spark
3.2*. Is this understanding correct!? Or is this