Hi Dmitri,

> What do you mean by "table already exists in Paimon"?

I mean the physical table on object store.

I double-checked the end-to-end flow between Spark and Polaris and found
that we do not need to handle the idempotent creation flow I mentioned
above.

Delta and Hudi may also encounter an interruption between physical table
creation in the object store and the callback to the Polaris REST API for
generic table registration, so this behavior is not unique to Paimon.

I’m happy to follow up if any improvements are needed to support Paimon in
Polaris.
Thanks.

Best regards,
I-Ting


Dmitri Bourlatchkov <[email protected]> 於 2026年5月26日週二
上午6:52寫道:

> Hi I-Ting,
>
> What do you mean by "table already exists in Paimon"?
>
> Do you mean a Generic Table in Polaris terminology?
>
> Thanks,
> Dmitri.
>
> On Sat, May 23, 2026 at 12:15 PM ITing Lee <[email protected]> wrote:
>
> > Hi all,
> >
> > After self-reviewing the PR again. I think we can make the Paimon and
> > Polaris integration idempotent in further improvement.
> >
> > The proposed flow is:
> >
> > 1. Check the Polaris metadata record first as an early return path.
> >    * If the table already exists in Polaris, return/load the table.
> >
> > 2. Check Paimon.
> >    * If the table already exists in Paimon, pass.
> >    * If the table does not exist in Paimon, create the namespace in
> Paimon
> > if needed, then create the table in Paimon.
> >
> > 3. Register the table in Polaris.
> >
> > With this approach, even if step 2 succeeds but step 3 fails, we can
> return
> > a detailed exception to the client and allow the client to retry. This
> > should make table creation across both systems idempotent.
> >
> > If this makes sense, I can make this improvement in a follow-up PR.
> > Thanks.
> >
> > Best regards,
> > I-Ting
> >
> > Dmitri Bourlatchkov <[email protected]> 於 2026年5月22日週五 上午3:18寫道:
> >
> > > Hi All,
> > >
> > > I'm bumping this thread because PR [3820] was mention here before.
> > >
> > > This discussion is interesting and useful. Still, on the practical
> side,
> > > how do you feel about merging [3820] now and working on Paimon-related
> > > improvements in follow-up PRs? Any objections?
> > >
> > > [3820] https://github.com/apache/polaris/pull/3820
> > >
> > > Thanks,
> > > Dmitri.
> > >
> > > On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > We are adding support for Paimon inside Polaris's SparkCatalog.
> Before
> > we
> > > > add more formats, we would like to get community input on the
> intended
> > > > architecture.
> > > >
> > > > This discussion originated from a code review conversation in PR
> #3820
> > > > <https://github.com/apache/polaris/pull/3820#discussion_r2865885791>
> > > >
> > > >
> > > >
> > > > *Current design*
> > > >
> > > > When SparkCatalog.loadTable is called, the routing works in three
> > phases:
> > > >
> > > >
> > > > 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it
> > > succeeds,
> > > > return immediately.
> > > >
> > > > 2. Call getTableFormat(ident), which makes a single HTTP GET to the
> > > Polaris
> > > > server to read the provider property stored in the generic table
> > > metadata,
> > > > without triggering any Spark DataSource resolution.
> > > >
> > > > 3. Route based on the provider string:
> > > >
> > > >     - "paimon"  : delegate to Paimon's SparkCatalog
> > > >
> > > >     - unknown/other : fall back to polarisSparkCatalog.loadTable,
> which
> > > > performs full DataSource resolution
> > > >
> > > >
> > > > The same three-phase pattern is repeated independently in loadTable,
> > > > alterTable, and dropTable*(But createTable is not following this
> > > pattern)*.
> > > > It might raise the concern that this makes the routing logic
> intrusive:
> > > > every new format requires parallel changes across all three methods,
> > and
> > > > there is no single place that describes the full routing policy.
> > > >
> > > >
> > > > *Questions for discussion*
> > > >
> > > >
> > > > 1. Should Polaris determine the provider first (via metadata) and
> > > delegate
> > > > to a single matching catalog, or should it attempt multiple
> > sub-catalogs
> > > in
> > > > a defined order?
> > > >
> > > > 2. If multiple sub-catalogs are supported, should there be a
> > documented,
> > > > deterministic
> > > >
> > > >   resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris
> > > > fallback)? Who owns that order, should it be configurable by
> operators?
> > > >
> > > > 3. Should the per-format routing logic be centralised behind an
> > > abstraction
> > > > (e.g. a SubCatalogRouter interface or a provider registry), so that
> > > adding
> > > > a new format is a single registration rather than edits across
> > loadTable,
> > > > alterTable, and dropTable?
> > > >
> > > > 4. Consistency:Should all table operations (loadTable, createTable,
> > > > alterTable, dropTable,
> > > >
> > > >   renameTable) follow the same routing strategy, or are per-operation
> > > > differences acceptable? Currently createTable has a different
> branching
> > > > structure from loadTable.
> > > >
> > > > 5. Is it in scope for Polaris to act as a routing layer for multiple
> > > table
> > > > providers, or should users who need both Polaris and Paimon configure
> > > them
> > > > as separate catalogs in their Spark session and route at the session
> > > level
> > > > themselves?
> > > >
> > > >
> > > > We have a working Paimon implementation today and would like to avoid
> > > > locking in a pattern that becomes hard to extend. Any input on the
> > design
> > > > direction, or pointers to prior discussion on this topic, would be
> much
> > > > appreciated.
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > I-Ting
> > > >
> > >
> >
>
>
> --
> Dmitri Bourlatchkov
> Senior Staff Software Engineer, Dremio
> Dremio.com
> <
> https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature
> >
> /
> Follow Us on LinkedIn <https://www.linkedin.com/company/dremio> / Get
> Started <https://www.dremio.com/get-started/>
>
>
> The Agentic Lakehouse
> The only lakehouse built for agents, managed by agents
>

Reply via email to