Hi all, Thanks for allowing me to bring up the topic about multiple table snapshot isolation in the community sync.
I did some research and found out the batch load API proposal from Steven Wu - https://lists.apache.org/thread/wbtnjsm59ocdgtfdn0rrpfg8gj7d7qg9 The proposal doesn't touch the transaction perspective of the API. And I think during the community sync one of the key question is whether or not we should make this API transactional and support snapshot isolation. I think there are a few good reasons that we should make it transactional or at least making transactional as an option: 1. Without a transactional batch get, we currently have no way to achieve SI for multi table statement. Our current `loadTable` API called sequentially basically gives us the Read Committed isolation level. This violates the spec definition for table properties - which only allows `Snapshot` or `Serializable`. In our data system (AWS Redshift), multi table statements represent a large percentage of total queries we see in the fleet. With current implementation, all these queries are potential running at a much weaker Read Committed level then they were designed to be. 2. We already have the multi table commit API - /v1/{prefix}/transactions/commit which requires commit to be done within an atomic transaction. So the transactional requirement for Catalog store is already there. It’s not new. And we should just leverage this property to give us SI for batch load. Regarding the CSN (Catalog Sequence Number) alternative, I also replied to Maninder’s comments in the proposal doc - https://docs.google.com/document/d/1u11b4pzeFUKD0XX--nHPj-DoYcNeCgOe94WKCaX2XMI/edit?tab=t.0 My high level take away is in order to implement CSN, the metadata json file needs to be generated, or rewritten, by the catalog service at the time of commit. This would most definitely require all commits to go through IRC, which doesn’t seem to be something will happen soon. Even if we plan for a long term CSN solution, the transaction read/write support on the catalog store is still required - for example we would need a single SI transaction to update CSN for an atomic commit. So from that perspective, I don’t think these two approaches are conflicting: the batch load API can return a snapshot view of objects `as-of-current` state, and in the future, if the object state contains a list of CSNs, client can also choose to load a historical snapshot by aligning the CSNs from multiple objects. Let’s continue this discussion. If people are aligned with providing a transactional batch load API, I can work with Steven on the API proposal for the details.
