Checked and extending Delegating Catalog Extension will be quite difficult or at least cause several breaks in current Iceberg SparkSessionCatalog implementations. Note this has nothing to do with third party catalogs but more directly with how Iceberg works with Spark regardless of Catalog implementation.
Main issues on the Iceberg side: 1. Initialize is final and empty in DelegatingCatalogExtension. This means we have no way of taking custom catalog configuration and applying to the Iceberg plugin. Currently this is used for a few things; choosing the underlying Iceberg catalog implementation, catalog cache settings, Iceberg environment context. 2. No access to delegate catalog object. The delegate is private so we are unable to touch it in our extended class which is currently used for Iceberg's "staged create" and "staged replace" functions. Here we could just work around this by disabling staged create and replace if the delegate is being used but that would be a break iceberg behavior. Outside of these aspects I was able to get everything else working as expected but I think both of these are probably blockers. On Wed, Sep 25, 2024 at 3:51 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > I think it should be minimally difficult to switch this around on the > Iceberg side, we only have to move the initialize code out and duplicate > it. Not a huge cost > > On Sun, Sep 22, 2024 at 11:39 PM Wenchen Fan <cloud0...@gmail.com> wrote: > >> It's a buggy behavior that a custom v2 catalog (without extending >> DelegatingCatalogExtension) expects Spark to still use the v1 DDL commands >> to operate on the tables inside it. This is also why the third-party >> catalogs (e.g. Unity Catalog and Apache Polaris) can not be used to >> overwrite `spark_catalog` if people still want to use the Spark built-in >> file sources. >> >> Technically, I think it's wrong for a third-party catalog to rely on >> Spark's session catalog without extending `DelegatingCatalogExtension`, as >> it confuses Spark. If it has its own metastore, then it shouldn't delegate >> requests to the Spark session catalog and use v1 DDL commands which only >> work with the Spark session catalog. Otherwise, it should extend >> `DelegatingCatalogExtension` to indicate it. >> >> On Mon, Sep 23, 2024 at 11:19 AM Manu Zhang <owenzhang1...@gmail.com> >> wrote: >> >>> Hi Iceberg and Spark community, >>> >>> I'd like to bring your attention to a recent change[1] in Spark 3.5.3 >>> that effectively breaks Iceberg's SparkSessionCatalog[2] and blocks Iceberg >>> upgrading to Spark 3.5.3[3]. >>> >>> SparkSessionCatalog, as a customized Spark V2 session catalog, >>> supports creating a V1 table with V1 command. That's no longer allowed >>> after the change unless it extends DelegatingCatalogExtension. It is not >>> minor work since SparkSessionCatalog already extends a base class[4]. >>> >>> To resolve this issue, we have to make changes to public interfaces at >>> either Spark or Iceberg side. IMHO, it doesn't make sense for a downstream >>> project to refactor its interfaces when bumping up a maintenance version of >>> Spark. WDYT? >>> >>> >>> 1. https://github.com/apache/spark/pull/47724 >>> 2. >>> https://iceberg.apache.org/docs/nightly/spark-configuration/#replacing-the-session-catalog >>> 3. https://github.com/apache/iceberg/pull/11160 >>> <https://github.com/apache/iceberg/pull/11160> >>> 4. >>> https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java >>> >>> Thanks, >>> Manu >>> >>>