[Discuss] Follow up on the multi table SI discussion

Xiening Dai Tue, 30 Jun 2026 14:02:07 -0700

Hi all,

Thanks for allowing me to bring up the topic about multiple table snapshot 
isolation in the community sync.

I did some research and found out the batch load API proposal from Steven Wu -
https://lists.apache.org/thread/wbtnjsm59ocdgtfdn0rrpfg8gj7d7qg9 The proposal
doesn't touch the transaction perspective of the API. And I think during the
community sync one of the key question is whether or not we should make this
API transactional and support snapshot isolation.

I think there are a few good reasons that we should make it transactional or at
least making transactional as an option:

1. Without a transactional batch get, we currently have no way to achieve SI
for multi table statement. Our current `loadTable` API called sequentially
basically gives us the Read Committed isolation level. This violates the spec
definition for table properties - which only allows `Snapshot` or
`Serializable`. In our data system (AWS Redshift), multi table statements
represent a large percentage of total queries we see in the fleet. With current
implementation, all these queries are potential running at a much weaker Read
Committed level then they were designed to be.

2. We already have the multi table commit API -
/v1/{prefix}/transactions/commit which requires commit to be done within an
atomic transaction. So the transactional requirement for Catalog store is
already there. It’s not new. And we should just leverage this property to give
us SI for batch load.

Regarding the CSN (Catalog Sequence Number) alternative, I also replied to
Maninder’s comments in the proposal doc -
https://docs.google.com/document/d/1u11b4pzeFUKD0XX--nHPj-DoYcNeCgOe94WKCaX2XMI/edit?tab=t.0

My high level take away is in order to implement CSN, the metadata json file
needs to be generated, or rewritten, by the catalog service at the time of
commit. This would most definitely require all commits to go through IRC, which
doesn’t seem to be something will happen soon. Even if we plan for a long term
CSN solution, the transaction read/write support on the catalog store is still
required - for example we would need a single SI transaction to update CSN for
an atomic commit. So from that perspective, I don’t think these two approaches
are conflicting: the batch load API can return a snapshot view of objects
`as-of-current` state, and in the future, if the object state contains a list
of CSNs, client can also choose to load a historical snapshot by aligning the
CSNs from multiple objects.

Let’s continue this discussion. If people are aligned with providing a
transactional batch load API, I can work with Steven on the API proposal for
the details.

[Discuss] Follow up on the multi table SI discussion

Reply via email to