Hi Robert, Thanks for the update and the draft PR !
I would like to use this thread to thank Dennis. Big kudos to Dennis for the changes he made: without these changes, it would have been impossible to add new backends like MongoDB. I propose we review and comment on Robert's PR. I would also like to propose a community meeting to discuss the Persistence Improvement and drive consensus. What about Tuesday, March 25th at 9:30am PST ? Thanks all ! Regards JB On Mon, Mar 17, 2025 at 2:43 PM Robert Stupp <sn...@snazy.de> wrote: > > Hi, > > I’ve made quite some progress on building the integration for NoSQL > databases. The initial code supports MongoDB [A], but is not limited to > that database. A working implementation has been pushed as a draft-PR > [1] for illustration purposes how it can look like when it is fully > integrated. A couple of smaller PRs will follow. > > Background: The only common denominator for "synchronization purposes” > that all NoSQL databases support is a single-row compare-and-swap (CAS) > operation - think of this as (pseudo-SQL) “UPDATE table SET x = > :new_value WHERE primary_key = :primary_key AND x = :expected_old_value”. > > The most important objective for the implementation is correctness, > especially in scenarios with high concurrent load. Explicit tests to > verify the correctness are included, for the CI “use case” and for > manual/special runs against a clustered database setup (which are just > “too much” for the Github hosted runners). > > The current integration point is > ‘MetaStoreManagerFactory’/’PolarisMetaStoreManager’ implemented in the > “bridge” Gradle project. > > The ‘components/persistence/README.md’ in the draft-PR contains more > technical information. > > A benchmarking tool to measure performance and correctness of Polaris > will be proposed soon as a separate/independent effort. We have used > this benchmarking tool to measure performance and implicitly the > correctness of the implementation. > > Implementations for particular (No)SQL databases are isolated in one > (Gradle) project per database. This is effectively/conceptually the same > approach that already works for Nessie, which supports quite some > databases [2]. > > Robert > > [1] https://github.com/apache/polaris/pull/1189 > [2] > https://projectnessie.org/nessie-latest/configuration/#support-for-the-database-specific-implementations > [A] Technically there is also an “in memory” implementation for testing > purposes (not intended to replace the existing one). > > > -- > Robert Stupp > @snazy >