Sounds reasonable to me. On Wed, Jan 31, 2024 at 7:56 AM <russell.spit...@gmail.com> wrote:
> Sounds like a reasonable thing to add? Maybe we could check cardinality to > pick out the default order as well? > Sent from my iPhone > > On Jan 30, 2024, at 3:50 PM, Jack Ye <yezhao...@gmail.com> wrote: > > > Hi everyone, > > Today, the rewrite manifest procedure always orders the data files based > on their *data_file.partition* value. Specifically, it sorts data files > that have the same partition value, and then does a repartition by range > based on the target number of manifest files (ref > <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java#L257-L258> > ), > > I notice that this approach does not always yield the best performance for > scan planning because the resulting manifest entries order is basically > based on the default order of the partition columns. > > For example, consider a table partitioned by columns a and b. By default > the rewrite procedure will organize manifest entries based on column a and > then b. If most of my queries are using b as the predicate, rewriting > manifests by sorting first against column b and then a will yield a much > shorter scan planning time, because all manifest entries with similar b > values are close together, and manifest list can be used to prune many > files already without opening the manifest files. > > This happens a lot for cases like b is an event time timestamp column, > which is not the first partition column, but actually the column that is > read most frequently in every query. > > Translated to code, this means we can benefit from something like: > > SparkActions.rewriteManifests(table) > .sort("b", "a") > .commit() > > Any thoughts? > > Best, > Jack Ye > >