Hi I-Ting,

Thanks for the clarification - LGTM.

Cheers,
Dmitri.

On Tue, May 26, 2026 at 12:10 PM ITing Lee <[email protected]> wrote:

> Hi Dmitri,
>
> > What do you mean by "table already exists in Paimon"?
>
> I mean the physical table on object store.
>
> I double-checked the end-to-end flow between Spark and Polaris and found
> that we do not need to handle the idempotent creation flow I mentioned
> above.
>
> Delta and Hudi may also encounter an interruption between physical table
> creation in the object store and the callback to the Polaris REST API for
> generic table registration, so this behavior is not unique to Paimon.
>
> I’m happy to follow up if any improvements are needed to support Paimon in
> Polaris.
> Thanks.
>
> Best regards,
> I-Ting
>
>
> Dmitri Bourlatchkov <[email protected]> 於 2026年5月26日週二
> 上午6:52寫道:
>
> > Hi I-Ting,
> >
> > What do you mean by "table already exists in Paimon"?
> >
> > Do you mean a Generic Table in Polaris terminology?
> >
> > Thanks,
> > Dmitri.
> >
> > On Sat, May 23, 2026 at 12:15 PM ITing Lee <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > After self-reviewing the PR again. I think we can make the Paimon and
> > > Polaris integration idempotent in further improvement.
> > >
> > > The proposed flow is:
> > >
> > > 1. Check the Polaris metadata record first as an early return path.
> > >    * If the table already exists in Polaris, return/load the table.
> > >
> > > 2. Check Paimon.
> > >    * If the table already exists in Paimon, pass.
> > >    * If the table does not exist in Paimon, create the namespace in
> > Paimon
> > > if needed, then create the table in Paimon.
> > >
> > > 3. Register the table in Polaris.
> > >
> > > With this approach, even if step 2 succeeds but step 3 fails, we can
> > return
> > > a detailed exception to the client and allow the client to retry. This
> > > should make table creation across both systems idempotent.
> > >
> > > If this makes sense, I can make this improvement in a follow-up PR.
> > > Thanks.
> > >
> > > Best regards,
> > > I-Ting
> > >
> > > Dmitri Bourlatchkov <[email protected]> 於 2026年5月22日週五 上午3:18寫道:
> > >
> > > > Hi All,
> > > >
> > > > I'm bumping this thread because PR [3820] was mention here before.
> > > >
> > > > This discussion is interesting and useful. Still, on the practical
> > side,
> > > > how do you feel about merging [3820] now and working on
> Paimon-related
> > > > improvements in follow-up PRs? Any objections?
> > > >
> > > > [3820] https://github.com/apache/polaris/pull/3820
> > > >
> > > > Thanks,
> > > > Dmitri.
> > > >
> > > > On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > We are adding support for Paimon inside Polaris's SparkCatalog.
> > Before
> > > we
> > > > > add more formats, we would like to get community input on the
> > intended
> > > > > architecture.
> > > > >
> > > > > This discussion originated from a code review conversation in PR
> > #3820
> > > > > <
> https://github.com/apache/polaris/pull/3820#discussion_r2865885791>
> > > > >
> > > > >
> > > > >
> > > > > *Current design*
> > > > >
> > > > > When SparkCatalog.loadTable is called, the routing works in three
> > > phases:
> > > > >
> > > > >
> > > > > 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it
> > > > succeeds,
> > > > > return immediately.
> > > > >
> > > > > 2. Call getTableFormat(ident), which makes a single HTTP GET to the
> > > > Polaris
> > > > > server to read the provider property stored in the generic table
> > > > metadata,
> > > > > without triggering any Spark DataSource resolution.
> > > > >
> > > > > 3. Route based on the provider string:
> > > > >
> > > > >     - "paimon"  : delegate to Paimon's SparkCatalog
> > > > >
> > > > >     - unknown/other : fall back to polarisSparkCatalog.loadTable,
> > which
> > > > > performs full DataSource resolution
> > > > >
> > > > >
> > > > > The same three-phase pattern is repeated independently in
> loadTable,
> > > > > alterTable, and dropTable*(But createTable is not following this
> > > > pattern)*.
> > > > > It might raise the concern that this makes the routing logic
> > intrusive:
> > > > > every new format requires parallel changes across all three
> methods,
> > > and
> > > > > there is no single place that describes the full routing policy.
> > > > >
> > > > >
> > > > > *Questions for discussion*
> > > > >
> > > > >
> > > > > 1. Should Polaris determine the provider first (via metadata) and
> > > > delegate
> > > > > to a single matching catalog, or should it attempt multiple
> > > sub-catalogs
> > > > in
> > > > > a defined order?
> > > > >
> > > > > 2. If multiple sub-catalogs are supported, should there be a
> > > documented,
> > > > > deterministic
> > > > >
> > > > >   resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris
> > > > > fallback)? Who owns that order, should it be configurable by
> > operators?
> > > > >
> > > > > 3. Should the per-format routing logic be centralised behind an
> > > > abstraction
> > > > > (e.g. a SubCatalogRouter interface or a provider registry), so that
> > > > adding
> > > > > a new format is a single registration rather than edits across
> > > loadTable,
> > > > > alterTable, and dropTable?
> > > > >
> > > > > 4. Consistency:Should all table operations (loadTable, createTable,
> > > > > alterTable, dropTable,
> > > > >
> > > > >   renameTable) follow the same routing strategy, or are
> per-operation
> > > > > differences acceptable? Currently createTable has a different
> > branching
> > > > > structure from loadTable.
> > > > >
> > > > > 5. Is it in scope for Polaris to act as a routing layer for
> multiple
> > > > table
> > > > > providers, or should users who need both Polaris and Paimon
> configure
> > > > them
> > > > > as separate catalogs in their Spark session and route at the
> session
> > > > level
> > > > > themselves?
> > > > >
> > > > >
> > > > > We have a working Paimon implementation today and would like to
> avoid
> > > > > locking in a pattern that becomes hard to extend. Any input on the
> > > design
> > > > > direction, or pointers to prior discussion on this topic, would be
> > much
> > > > > appreciated.
> > > > >
> > > > >
> > > > > Best regards,
> > > > >
> > > > > I-Ting
> > > > >
> > > >
> > >
> >
> >
> > --
> > Dmitri Bourlatchkov
> > Senior Staff Software Engineer, Dremio
> > Dremio.com
> > <
> >
> https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature
> > >
> > /
> > Follow Us on LinkedIn <https://www.linkedin.com/company/dremio> / Get
> > Started <https://www.dremio.com/get-started/>
> >
> >
> > The Agentic Lakehouse
> > The only lakehouse built for agents, managed by agents
> >
>

Reply via email to