Re: NoSQL database agnostic persistence

Russell Spitzer Thu, 20 Mar 2025 07:06:18 -0700

We could just keep the exact same total number of entities, so 65 catalogs,
10 namespaces each with 100 Tables and Views. Or you can scale it up if you
want. The insights I'm particularly interested in are the dynamics of
listing and accessing tables in a namespace with multiple entries. The
areas I want to test for generic implementations are things like,


What does performance look like when listing entries within a namespace
Does "create" performance chance if there are other tables in the same
namespace
Is concurrency within a namespace the same as concurrency between
namespaces?

I'm just a little worried about us optimizing in the future for a synthetic
scenario. I think we are all in agreement that it's unlikely for a Polaris
use case to actually have a 1 to 1 ratio correct so I would just want to
make sure we are benchmarking against something that looks a little closer
to what a real layout would look like.

On Thu, Mar 20, 2025 at 3:26 AM Pierre Laporte <pie...@pingtimeout.fr>
wrote:

> I think what you are referring to is the "number of tables per namespace"
> property.  See the binary tree example
> <
> https://github.com/pingtimeout/polaris/tree/persistence-benchmarks/benchmarks#binary-tree-example
> >
> in the docs where, after a binary tree of namespaces, 5 tables are created
> in each namespace.  So yes, a scenario with 100 tables per namespace is
> definitely possible.
>
> Now, could we run those as well as part of this effort, I would say that it
> depends.  What are the specific insights we are trying to get out of that
> new scenario?
>
> I believe the benchmark would need a reasonable number of namespaces as
> well.  So an additional question is: what would the namespaces tree look
> like in terms of width and height?  That could easily multiply to a high
> number of entities, which in turn would mean that *only* the new
> persistence implementation + MongoDB can be used.  I am fine with that, as
> it is clear to me that the new persistence layer is the way to go.  Does
> everybody agree with that statement?
>
> --
>
> Pierre
>
> On Wed, Mar 19, 2025 at 9:33 PM Russell Spitzer <russell.spit...@gmail.com
> >
> wrote:
>
> > I think I saw in the other document you had some benchmarks with a less
> 1N
> > to 1T ratio? Could we run some of those as well? It would be great to
> have
> > something with closer to a 1 Namspace to 100 tables sort of layout.
> >
> > On Wed, Mar 19, 2025 at 3:06 PM Pierre Laporte <pie...@pingtimeout.fr>
> > wrote:
> >
> > > Just a heads up, I updated the report with the latest results from the
> > > persistence work, as well as the tarball with raw results.
> > >
> > > --
> > >
> > > Pierre Laporte
> > > @pingtimeout <https://twitter.com/pingtimeout>
> > > pie...@pingtimeout.fr
> > >
> > >
> > > On Wed, Mar 19, 2025 at 3:20 PM Pierre Laporte <pie...@pingtimeout.fr>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have been working on a set of benchmarks for Polaris [1].  I have
> run
> > > > them against the current main branch (Eclipselink+Postgresql)
> > > > implementation as well as the NoSQL persistence layer implementation
> > [2].
> > > > The complete report for these performance tests is available at this
> > > > address:
> > > >
> > >
> >
> https://docs.google.com/document/d/1RLYaAtNUkgNW3Ef7-BWfF_8RkSK7B7oR/edit.
> > > > Feel free to review it at your convenience.
> > > >
> > > > The benchmarks demonstrate that the new Persistence implementation
> > > offers:
> > > >
> > > >    - Comparable or better performance for sequential operations
> > > >    - Significantly better reliability under concurrent load
> > > >    - Consistent read performance even under high-concurrency
> scenarios
> > > >    - Some challenges with write operations under high concurrent
> writes
> > > >    conditions (under investigation)
> > > >
> > > > These results suggest that the NoSQL persistence layer implementation
> > > > provides a robust foundation for scaling Polaris, particularly for
> > > > workloads dominated by high concurrency.
> > > >
> > > > I will soon open a separate PR to contribute these benchmarks to the
> > main
> > > > codebase.
> > > >
> > > > Let me know if you have any question.
> > > >
> > > > Pierre
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/pingtimeout/polaris/tree/persistence-benchmarks/benchmarks
> > > > [2] https://github.com/apache/polaris/pull/1189
> > > >
> > > > --
> > > >
> > > > Pierre Laporte
> > > > @pingtimeout <https://twitter.com/pingtimeout>
> > > > pie...@pingtimeout.fr
> > > > http://www.pingtimeout.fr/
> > > >
> > > >
> > > > On Mon, Mar 17, 2025 at 3:46 PM Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > > > wrote:
> > > >
> > > >> Hi Robert,
> > > >>
> > > >> Thanks for the update and the draft PR !
> > > >>
> > > >> I would like to use this thread to thank Dennis. Big kudos to Dennis
> > > >> for the changes he made: without these changes, it would have been
> > > >> impossible to add new backends like MongoDB.
> > > >>
> > > >> I propose we review and comment on Robert's PR.
> > > >>
> > > >> I would also like to propose a community meeting to discuss the
> > > >> Persistence Improvement and drive consensus.
> > > >> What about Tuesday, March 25th at 9:30am PST ?
> > > >>
> > > >> Thanks all !
> > > >>
> > > >> Regards
> > > >> JB
> > > >>
> > > >> On Mon, Mar 17, 2025 at 2:43 PM Robert Stupp <sn...@snazy.de>
> wrote:
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> > I’ve made quite some progress on building the integration for
> NoSQL
> > > >> > databases. The initial code supports MongoDB [A], but is not
> limited
> > > to
> > > >> > that database. A working implementation has been pushed as a
> > draft-PR
> > > >> > [1] for illustration purposes how it can look like when it is
> fully
> > > >> > integrated. A couple of smaller PRs will follow.
> > > >> >
> > > >> > Background: The only common denominator for "synchronization
> > purposes”
> > > >> > that all NoSQL databases support is a single-row compare-and-swap
> > > (CAS)
> > > >> > operation - think of this as (pseudo-SQL) “UPDATE table SET x =
> > > >> > :new_value WHERE primary_key = :primary_key AND x =
> > > >> :expected_old_value”.
> > > >> >
> > > >> > The most important objective for the implementation is
> correctness,
> > > >> > especially in scenarios with high concurrent load. Explicit tests
> to
> > > >> > verify the correctness are included, for the CI “use case” and for
> > > >> > manual/special runs against a clustered database setup (which are
> > just
> > > >> > “too much” for the Github hosted runners).
> > > >> >
> > > >> > The current integration point is
> > > >> > ‘MetaStoreManagerFactory’/’PolarisMetaStoreManager’ implemented in
> > the
> > > >> > “bridge” Gradle project.
> > > >> >
> > > >> > The ‘components/persistence/README.md’ in the draft-PR contains
> more
> > > >> > technical information.
> > > >> >
> > > >> > A benchmarking tool to measure performance and correctness of
> > Polaris
> > > >> > will be proposed soon as a separate/independent effort. We have
> used
> > > >> > this benchmarking tool to measure performance and implicitly the
> > > >> > correctness of the implementation.
> > > >> >
> > > >> > Implementations for particular (No)SQL databases are isolated in
> one
> > > >> > (Gradle) project per database. This is effectively/conceptually
> the
> > > same
> > > >> > approach that already works for Nessie, which supports quite some
> > > >> > databases [2].
> > > >> >
> > > >> > Robert
> > > >> >
> > > >> > [1] https://github.com/apache/polaris/pull/1189
> > > >> > [2]
> > > >> >
> > > >>
> > >
> >
> https://projectnessie.org/nessie-latest/configuration/#support-for-the-database-specific-implementations
> > > >> > [A] Technically there is also an “in memory” implementation for
> > > testing
> > > >> > purposes (not intended to replace the existing one).
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Robert Stupp
> > > >> > @snazy
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: NoSQL database agnostic persistence

Reply via email to