Hi All, I'm bumping this thread because PR [3820] was mention here before.
This discussion is interesting and useful. Still, on the practical side, how do you feel about merging [3820] now and working on Paimon-related improvements in follow-up PRs? Any objections? [3820] https://github.com/apache/polaris/pull/3820 Thanks, Dmitri. On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote: > Hi all, > > We are adding support for Paimon inside Polaris's SparkCatalog. Before we > add more formats, we would like to get community input on the intended > architecture. > > This discussion originated from a code review conversation in PR #3820 > <https://github.com/apache/polaris/pull/3820#discussion_r2865885791> > > > > *Current design* > > When SparkCatalog.loadTable is called, the routing works in three phases: > > > 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it succeeds, > return immediately. > > 2. Call getTableFormat(ident), which makes a single HTTP GET to the Polaris > server to read the provider property stored in the generic table metadata, > without triggering any Spark DataSource resolution. > > 3. Route based on the provider string: > > - "paimon" : delegate to Paimon's SparkCatalog > > - unknown/other : fall back to polarisSparkCatalog.loadTable, which > performs full DataSource resolution > > > The same three-phase pattern is repeated independently in loadTable, > alterTable, and dropTable*(But createTable is not following this pattern)*. > It might raise the concern that this makes the routing logic intrusive: > every new format requires parallel changes across all three methods, and > there is no single place that describes the full routing policy. > > > *Questions for discussion* > > > 1. Should Polaris determine the provider first (via metadata) and delegate > to a single matching catalog, or should it attempt multiple sub-catalogs in > a defined order? > > 2. If multiple sub-catalogs are supported, should there be a documented, > deterministic > > resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris > fallback)? Who owns that order, should it be configurable by operators? > > 3. Should the per-format routing logic be centralised behind an abstraction > (e.g. a SubCatalogRouter interface or a provider registry), so that adding > a new format is a single registration rather than edits across loadTable, > alterTable, and dropTable? > > 4. Consistency:Should all table operations (loadTable, createTable, > alterTable, dropTable, > > renameTable) follow the same routing strategy, or are per-operation > differences acceptable? Currently createTable has a different branching > structure from loadTable. > > 5. Is it in scope for Polaris to act as a routing layer for multiple table > providers, or should users who need both Polaris and Paimon configure them > as separate catalogs in their Spark session and route at the session level > themselves? > > > We have a working Paimon implementation today and would like to avoid > locking in a pattern that becomes hard to extend. Any input on the design > direction, or pointers to prior discussion on this topic, would be much > appreciated. > > > Best regards, > > I-Ting >
