Hi Gautam, Iceberg does support nested schema pruning but Spark doesn’t request this for DS V2 in 2.4. Internally, we had to modify Spark 2.4 to make this work end-to-end. One of the options is to extend DataSourceV2Strategy with logic similar to what we have in ParquetSchemaPruning in 2.4.0. I think we can share that part if needed.
I am planning to check whether Spark master already has this functionality. If that’s not implemented and nobody is working on it yet, I can fix it. - Anton > On 30 Aug 2019, at 15:42, Gautam <gautamkows...@gmail.com> wrote: > > Hello Devs, > I was measuring perf on structs between V1 and V2 > datasources. Found that although Iceberg Reader supports > `SupportsPushDownRequiredColumns` it doesn't seem to prune nested column > projections. I want to be able to prune on nested fields. How does V2 > datasource have provision to be able to let Iceberg decide this? The > `SupportsPushDownRequiredColumns` mix-in gives the entire struct field even > if a sub-field is requested. > > Here's an illustration .. > > scala> spark.sql("select location.lat from iceberg_people_struct").show() > +-------+ > | lat| > +-------+ > | null| > |101.123| > |175.926| > +-------+ > > > The pruning gets the entire struct instead of just `location.lat` .. > > public void pruneColumns(StructType newRequestedSchema) > > 19/08/30 16:25:38 WARN Reader: => Prune columns : { > "type" : "struct", > "fields" : [ { > "name" : "location", > "type" : { > "type" : "struct", > "fields" : [ { > "name" : "lat", > "type" : "double", > "nullable" : true, > "metadata" : { } > }, { > "name" : "lon", > "type" : "double", > "nullable" : true, > "metadata" : { } > } ] > }, > "nullable" : true, > "metadata" : { } > } ] > } > > Is there information I can use in the IcebergSource (or add some) that can be > used to prune the exact sub-field here? What's a good way to approach this? > For dense/wide struct fields this affects performance significantly. > > > Sample gist: https://gist.github.com/prodeezy/001cf155ff0675be7d307e9f842e1dac > > > thanks and regards, > -Gautam.