Hi All,

I'm bumping this thread because PR [3820] was mention here before.

This discussion is interesting and useful. Still, on the practical side,
how do you feel about merging [3820] now and working on Paimon-related
improvements in follow-up PRs? Any objections?

[3820] https://github.com/apache/polaris/pull/3820

Thanks,
Dmitri.

On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote:

> Hi all,
>
> We are adding support for Paimon inside Polaris's SparkCatalog. Before we
> add more formats, we would like to get community input on the intended
> architecture.
>
> This discussion originated from a code review conversation in PR #3820
> <https://github.com/apache/polaris/pull/3820#discussion_r2865885791>
>
>
>
> *Current design*
>
> When SparkCatalog.loadTable is called, the routing works in three phases:
>
>
> 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it succeeds,
> return immediately.
>
> 2. Call getTableFormat(ident), which makes a single HTTP GET to the Polaris
> server to read the provider property stored in the generic table metadata,
> without triggering any Spark DataSource resolution.
>
> 3. Route based on the provider string:
>
>     - "paimon"  : delegate to Paimon's SparkCatalog
>
>     - unknown/other : fall back to polarisSparkCatalog.loadTable, which
> performs full DataSource resolution
>
>
> The same three-phase pattern is repeated independently in loadTable,
> alterTable, and dropTable*(But createTable is not following this pattern)*.
> It might raise the concern that this makes the routing logic intrusive:
> every new format requires parallel changes across all three methods, and
> there is no single place that describes the full routing policy.
>
>
> *Questions for discussion*
>
>
> 1. Should Polaris determine the provider first (via metadata) and delegate
> to a single matching catalog, or should it attempt multiple sub-catalogs in
> a defined order?
>
> 2. If multiple sub-catalogs are supported, should there be a documented,
> deterministic
>
>   resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris
> fallback)? Who owns that order, should it be configurable by operators?
>
> 3. Should the per-format routing logic be centralised behind an abstraction
> (e.g. a SubCatalogRouter interface or a provider registry), so that adding
> a new format is a single registration rather than edits across loadTable,
> alterTable, and dropTable?
>
> 4. Consistency:Should all table operations (loadTable, createTable,
> alterTable, dropTable,
>
>   renameTable) follow the same routing strategy, or are per-operation
> differences acceptable? Currently createTable has a different branching
> structure from loadTable.
>
> 5. Is it in scope for Polaris to act as a routing layer for multiple table
> providers, or should users who need both Polaris and Paimon configure them
> as separate catalogs in their Spark session and route at the session level
> themselves?
>
>
> We have a working Paimon implementation today and would like to avoid
> locking in a pattern that becomes hard to extend. Any input on the design
> direction, or pointers to prior discussion on this topic, would be much
> appreciated.
>
>
> Best regards,
>
> I-Ting
>

Reply via email to