I think that probably the best way to handle this use case is to have people implement the Iceberg `ProcedureCatalog` API. That's what we want to get upstream into Spark and is a really reasonable (and small) addition to Spark.
The problem with adding pluggable procedures to Iceberg is that it is really working around the fact that Spark doesn't support plugging in procedures yet. This is specific to Spark and we would have to keep it alive well past when we get `ProcedureCatalog` upstream. It doesn't seem worth the additional complexity in Iceberg, when you can plug in through the API intended to be Spark's own plugin API, if that makes sense. Ryan On Wed, Nov 10, 2021 at 6:54 AM Ajantha Bhat <ajanthab...@gmail.com> wrote: > Hi Community! > > If Iceberg provides a capability to plugin procedures, it will be really > helpful for users to plugin their own spark actions to handle their > business logic around Iceberg tables. > So, can we have a mechanism that allows plugging additional > implementations of *org.apache.spark.sql.connector.iceberg.catalog.Procedure > * > for all users of SparkCatalog and SparkSessionCatalog by just dropping an > additional jar ? > > Without this feature, users can still add their custom procedure by > extending *SparkCatalog* and/or *SparkSessionCatalog* and override > *loadProcedure. *Which requires users to configure the subclasses of > Spark[Session]Catalog in their Spark configuration. This way it is a lot of > work and it is not a clean way to handle this. > > Another option is to add these custom procedures as UDF, but UDF is meant > to be column related. It doesn't make sense to have UDF for spark actions. > > > *So, I want to know what most of you think about having pluggable > procedures in Iceberg? Does this feature solve your problems too?* > > Thanks, > Ajantha > -- Ryan Blue Tabular