I think that probably the best way to handle this use case is to have
people implement the Iceberg `ProcedureCatalog` API. That's what we want to
get upstream into Spark and is a really reasonable (and small) addition to
Spark.

The problem with adding pluggable procedures to Iceberg is that it is
really working around the fact that Spark doesn't support plugging in
procedures yet. This is specific to Spark and we would have to keep it
alive well past when we get `ProcedureCatalog` upstream. It doesn't seem
worth the additional complexity in Iceberg, when you can plug in through
the API intended to be Spark's own plugin API, if that makes sense.

Ryan

On Wed, Nov 10, 2021 at 6:54 AM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> Hi Community!
>
> If Iceberg provides a capability to plugin procedures, it will be really
> helpful for users to plugin their own spark actions to handle their
> business logic around Iceberg tables.
> So, can we have a mechanism that allows plugging additional
> implementations of *org.apache.spark.sql.connector.iceberg.catalog.Procedure
> *
> for all users of SparkCatalog and SparkSessionCatalog by just dropping an
> additional jar ?
>
> Without this feature, users can still add their custom procedure by
> extending *SparkCatalog* and/or *SparkSessionCatalog* and override
> *loadProcedure. *Which requires users to configure the subclasses of
> Spark[Session]Catalog in their Spark configuration. This way it is a lot of
> work and it is not a clean way to handle this.
>
> Another option is to add these custom procedures as UDF, but UDF is meant
> to be column related. It doesn't make sense to have UDF for spark actions.
>
>
> *So, I want to know what most of you think about having pluggable
> procedures in Iceberg? Does this feature solve your problems too?*
>
> Thanks,
> Ajantha
>


-- 
Ryan Blue
Tabular

Reply via email to