Re: Scan planning + joins

Ryan Blue Sat, 21 Sep 2019 14:51:35 -0700

Hi Dave,

This depends on Spark. Iceberg supports Spark's predicate push-down, where
Spark will forward its filters into the Iceberg scan. But, which filters
are pushed down is up to Spark.

Spark can make some inferences with join predicates. For example, if you
have a filter after a join, it can push the predicate through the join and
down to the source. But I don't think it is able to make all of the
inferences you or I would. For example, if you have an inner join on t1.id
= t2.id and a filter below the join on t1, it won't push that filter across
to the other side of the join. This is a limitation of Spark's optimizer,
not Iceberg.

One last thing to note is that Iceberg doesn't just evaluate these
predicates on partition data. It can also use stats for data files to prune
out unnecessary splits.

On Fri, Sep 20, 2019 at 1:24 PM Dave Sugden <[email protected]>
wrote:

> Hi,
>
> I'd  just like to confirm that, when using the iceberg spark datasource,
> reading an iceberg table that has been identity partitioned on column x,
> then performing some equi-join on said column x, does iceberg's scan
> planning work as if a predicate on that column had been supplied? IE, no
> full table scan?
>
> Can this be observed with dataframe.explain ?
>
> Thanks!
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Scan planning + joins

Reply via email to