Hi Dave, This depends on Spark. Iceberg supports Spark's predicate push-down, where Spark will forward its filters into the Iceberg scan. But, which filters are pushed down is up to Spark.
Spark can make some inferences with join predicates. For example, if you have a filter after a join, it can push the predicate through the join and down to the source. But I don't think it is able to make all of the inferences you or I would. For example, if you have an inner join on t1.id = t2.id and a filter below the join on t1, it won't push that filter across to the other side of the join. This is a limitation of Spark's optimizer, not Iceberg. One last thing to note is that Iceberg doesn't just evaluate these predicates on partition data. It can also use stats for data files to prune out unnecessary splits. On Fri, Sep 20, 2019 at 1:24 PM Dave Sugden <[email protected]> wrote: > Hi, > > I'd just like to confirm that, when using the iceberg spark datasource, > reading an iceberg table that has been identity partitioned on column x, > then performing some equi-join on said column x, does iceberg's scan > planning work as if a predicate on that column had been supplied? IE, no > full table scan? > > Can this be observed with dataframe.explain ? > > Thanks! > -- Ryan Blue Software Engineer Netflix
