Checked and extending Delegating Catalog Extension will be quite difficult
or at least cause several breaks in current Iceberg SparkSessionCatalog
implementations. Note this has nothing to do with third party catalogs but
more directly with how Iceberg works with Spark regardless of Catalog
implementation.

Main issues on the Iceberg side:

1. Initialize is final and empty in DelegatingCatalogExtension. This means
we have no way of taking custom catalog configuration and applying to the
Iceberg plugin. Currently this is used for a few things; choosing the
underlying Iceberg catalog implementation, catalog cache settings, Iceberg
environment context.

2. No access to delegate catalog object. The delegate is private so we are
unable to touch it in our extended class which is currently used for
Iceberg's "staged create" and "staged replace" functions. Here we could
just work around this by disabling staged create and replace if the
delegate is being used but that would be a break iceberg behavior.

Outside of these aspects I was able to get everything else working as
expected but I think both of these are probably blockers.


On Wed, Sep 25, 2024 at 3:51 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I think it should be minimally difficult to switch this around on the
> Iceberg side, we only have to move the initialize code out and duplicate
> it. Not a huge cost
>
> On Sun, Sep 22, 2024 at 11:39 PM Wenchen Fan <cloud0...@gmail.com> wrote:
>
>> It's a buggy behavior that a custom v2 catalog (without extending
>> DelegatingCatalogExtension) expects Spark to still use the v1 DDL commands
>> to operate on the tables inside it. This is also why the third-party
>> catalogs (e.g. Unity Catalog and Apache Polaris) can not be used to
>> overwrite `spark_catalog` if people still want to use the Spark built-in
>> file sources.
>>
>> Technically, I think it's wrong for a third-party catalog to rely on
>> Spark's session catalog without extending `DelegatingCatalogExtension`, as
>> it confuses Spark. If it has its own metastore, then it shouldn't delegate
>> requests to the Spark session catalog and use v1 DDL commands which only
>> work with the Spark session catalog. Otherwise, it should extend
>> `DelegatingCatalogExtension` to indicate it.
>>
>> On Mon, Sep 23, 2024 at 11:19 AM Manu Zhang <owenzhang1...@gmail.com>
>> wrote:
>>
>>> Hi Iceberg and Spark community,
>>>
>>> I'd like to bring your attention to a recent change[1] in Spark 3.5.3
>>> that effectively breaks Iceberg's SparkSessionCatalog[2] and blocks Iceberg
>>> upgrading to Spark 3.5.3[3].
>>>
>>> SparkSessionCatalog, as a customized Spark V2 session catalog,
>>> supports creating a V1 table with V1 command. That's no longer allowed
>>> after the change unless it extends DelegatingCatalogExtension. It is not
>>> minor work since SparkSessionCatalog already extends a base class[4].
>>>
>>> To resolve this issue, we have to make changes to public interfaces at
>>> either Spark or Iceberg side. IMHO, it doesn't make sense for a downstream
>>> project to refactor its interfaces when bumping up a maintenance version of
>>> Spark. WDYT?
>>>
>>>
>>> 1. https://github.com/apache/spark/pull/47724
>>> 2.
>>> https://iceberg.apache.org/docs/nightly/spark-configuration/#replacing-the-session-catalog
>>> 3. https://github.com/apache/iceberg/pull/11160
>>> <https://github.com/apache/iceberg/pull/11160>
>>> 4.
>>> https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
>>>
>>> Thanks,
>>> Manu
>>>
>>>

Reply via email to