On Fri, Mar 24, 2023 at 1:46 PM John Zhuge <jzh...@apache.org> wrote:
> Have you checked out SparkCatalog > <https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java> > in > Apache Iceberg project? More docs at > https://iceberg.apache.org/docs/latest/spark-configuration/#catalogs > No, I hadn't seen that one yet, thanks! Another question: our partitions have no useful uniqueness criteria other than a storage URL which should never be exposed to user-space. Our "primary" index is a timestamp, and multiple partitions within a table can have overlapping time ranges. We support an additional shard key but it's optional. Is there something like partition discovery in DataSourceV2 where I should list all the (potentially many thousands) of partitions for a table, or can I leave them unpopulated until query planning time, when time range predicates often have extremely high selectivity? Thanks! -0xe1a >