Hi I-Ting, Thanks for the clarification - LGTM.
Cheers, Dmitri. On Tue, May 26, 2026 at 12:10 PM ITing Lee <[email protected]> wrote: > Hi Dmitri, > > > What do you mean by "table already exists in Paimon"? > > I mean the physical table on object store. > > I double-checked the end-to-end flow between Spark and Polaris and found > that we do not need to handle the idempotent creation flow I mentioned > above. > > Delta and Hudi may also encounter an interruption between physical table > creation in the object store and the callback to the Polaris REST API for > generic table registration, so this behavior is not unique to Paimon. > > I’m happy to follow up if any improvements are needed to support Paimon in > Polaris. > Thanks. > > Best regards, > I-Ting > > > Dmitri Bourlatchkov <[email protected]> 於 2026年5月26日週二 > 上午6:52寫道: > > > Hi I-Ting, > > > > What do you mean by "table already exists in Paimon"? > > > > Do you mean a Generic Table in Polaris terminology? > > > > Thanks, > > Dmitri. > > > > On Sat, May 23, 2026 at 12:15 PM ITing Lee <[email protected]> wrote: > > > > > Hi all, > > > > > > After self-reviewing the PR again. I think we can make the Paimon and > > > Polaris integration idempotent in further improvement. > > > > > > The proposed flow is: > > > > > > 1. Check the Polaris metadata record first as an early return path. > > > * If the table already exists in Polaris, return/load the table. > > > > > > 2. Check Paimon. > > > * If the table already exists in Paimon, pass. > > > * If the table does not exist in Paimon, create the namespace in > > Paimon > > > if needed, then create the table in Paimon. > > > > > > 3. Register the table in Polaris. > > > > > > With this approach, even if step 2 succeeds but step 3 fails, we can > > return > > > a detailed exception to the client and allow the client to retry. This > > > should make table creation across both systems idempotent. > > > > > > If this makes sense, I can make this improvement in a follow-up PR. > > > Thanks. > > > > > > Best regards, > > > I-Ting > > > > > > Dmitri Bourlatchkov <[email protected]> 於 2026年5月22日週五 上午3:18寫道: > > > > > > > Hi All, > > > > > > > > I'm bumping this thread because PR [3820] was mention here before. > > > > > > > > This discussion is interesting and useful. Still, on the practical > > side, > > > > how do you feel about merging [3820] now and working on > Paimon-related > > > > improvements in follow-up PRs? Any objections? > > > > > > > > [3820] https://github.com/apache/polaris/pull/3820 > > > > > > > > Thanks, > > > > Dmitri. > > > > > > > > On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote: > > > > > > > > > Hi all, > > > > > > > > > > We are adding support for Paimon inside Polaris's SparkCatalog. > > Before > > > we > > > > > add more formats, we would like to get community input on the > > intended > > > > > architecture. > > > > > > > > > > This discussion originated from a code review conversation in PR > > #3820 > > > > > < > https://github.com/apache/polaris/pull/3820#discussion_r2865885791> > > > > > > > > > > > > > > > > > > > > *Current design* > > > > > > > > > > When SparkCatalog.loadTable is called, the routing works in three > > > phases: > > > > > > > > > > > > > > > 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it > > > > succeeds, > > > > > return immediately. > > > > > > > > > > 2. Call getTableFormat(ident), which makes a single HTTP GET to the > > > > Polaris > > > > > server to read the provider property stored in the generic table > > > > metadata, > > > > > without triggering any Spark DataSource resolution. > > > > > > > > > > 3. Route based on the provider string: > > > > > > > > > > - "paimon" : delegate to Paimon's SparkCatalog > > > > > > > > > > - unknown/other : fall back to polarisSparkCatalog.loadTable, > > which > > > > > performs full DataSource resolution > > > > > > > > > > > > > > > The same three-phase pattern is repeated independently in > loadTable, > > > > > alterTable, and dropTable*(But createTable is not following this > > > > pattern)*. > > > > > It might raise the concern that this makes the routing logic > > intrusive: > > > > > every new format requires parallel changes across all three > methods, > > > and > > > > > there is no single place that describes the full routing policy. > > > > > > > > > > > > > > > *Questions for discussion* > > > > > > > > > > > > > > > 1. Should Polaris determine the provider first (via metadata) and > > > > delegate > > > > > to a single matching catalog, or should it attempt multiple > > > sub-catalogs > > > > in > > > > > a defined order? > > > > > > > > > > 2. If multiple sub-catalogs are supported, should there be a > > > documented, > > > > > deterministic > > > > > > > > > > resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris > > > > > fallback)? Who owns that order, should it be configurable by > > operators? > > > > > > > > > > 3. Should the per-format routing logic be centralised behind an > > > > abstraction > > > > > (e.g. a SubCatalogRouter interface or a provider registry), so that > > > > adding > > > > > a new format is a single registration rather than edits across > > > loadTable, > > > > > alterTable, and dropTable? > > > > > > > > > > 4. Consistency:Should all table operations (loadTable, createTable, > > > > > alterTable, dropTable, > > > > > > > > > > renameTable) follow the same routing strategy, or are > per-operation > > > > > differences acceptable? Currently createTable has a different > > branching > > > > > structure from loadTable. > > > > > > > > > > 5. Is it in scope for Polaris to act as a routing layer for > multiple > > > > table > > > > > providers, or should users who need both Polaris and Paimon > configure > > > > them > > > > > as separate catalogs in their Spark session and route at the > session > > > > level > > > > > themselves? > > > > > > > > > > > > > > > We have a working Paimon implementation today and would like to > avoid > > > > > locking in a pattern that becomes hard to extend. Any input on the > > > design > > > > > direction, or pointers to prior discussion on this topic, would be > > much > > > > > appreciated. > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > I-Ting > > > > > > > > > > > > > > > > > > -- > > Dmitri Bourlatchkov > > Senior Staff Software Engineer, Dremio > > Dremio.com > > < > > > https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature > > > > > / > > Follow Us on LinkedIn <https://www.linkedin.com/company/dremio> / Get > > Started <https://www.dremio.com/get-started/> > > > > > > The Agentic Lakehouse > > The only lakehouse built for agents, managed by agents > > >
