Re: NoSQL database agnostic persistence

Robert Stupp Thu, 20 Mar 2025 07:54:52 -0700

IMO we should test the s..t out of the implementation to ensure thatit's stable and reliable and correct.

Once that's done, we can certainly have some "realistic referencebenchmark" with defined, different and concurrent user stories. That'sdefinitely possible with Gatling.


On 20.03.25 14:40, Russell Spitzer wrote:

We could just keep the exact same total number of entities, so 65 catalogs,
10 namespaces each with 100 Tables and Views. Or you can scale it up if you
want. The insights I'm particularly interested in are the dynamics of
listing and accessing tables in a namespace with multiple entries. The
areas I want to test for generic implementations are things like,

What does performance look like when listing entries within a namespace
Does "create" performance chance if there are other tables in the same
namespace
Is concurrency within a namespace the same as concurrency between
namespaces?

I'm just a little worried about us optimizing in the future for a synthetic
scenario. I think we are all in agreement that it's unlikely for a Polaris
use case to actually have a 1 to 1 ratio correct so I would just want to
make sure we are benchmarking against something that looks a little closer
to what a real layout would look like.

On Thu, Mar 20, 2025 at 3:26 AM Pierre Laporte <pie...@pingtimeout.fr>
wrote:

I think what you are referring to is the "number of tables per namespace"
property.  See the binary tree example
<
https://github.com/pingtimeout/polaris/tree/persistence-benchmarks/benchmarks#binary-tree-example
in the docs where, after a binary tree of namespaces, 5 tables are created
in each namespace.  So yes, a scenario with 100 tables per namespace is
definitely possible.

Now, could we run those as well as part of this effort, I would say that it
depends.  What are the specific insights we are trying to get out of that
new scenario?

I believe the benchmark would need a reasonable number of namespaces as
well.  So an additional question is: what would the namespaces tree look
like in terms of width and height?  That could easily multiply to a high
number of entities, which in turn would mean that *only* the new
persistence implementation + MongoDB can be used.  I am fine with that, as
it is clear to me that the new persistence layer is the way to go.  Does
everybody agree with that statement?

--

Pierre

On Wed, Mar 19, 2025 at 9:33 PM Russell Spitzer <russell.spit...@gmail.com
wrote:

I think I saw in the other document you had some benchmarks with a less

1N

to 1T ratio? Could we run some of those as well? It would be great to

have

something with closer to a 1 Namspace to 100 tables sort of layout.

On Wed, Mar 19, 2025 at 3:06 PM Pierre Laporte <pie...@pingtimeout.fr>
wrote:

Just a heads up, I updated the report with the latest results from the
persistence work, as well as the tarball with raw results.

--

Pierre Laporte
@pingtimeout <https://twitter.com/pingtimeout>
pie...@pingtimeout.fr


On Wed, Mar 19, 2025 at 3:20 PM Pierre Laporte <pie...@pingtimeout.fr>
wrote:

Hi,

I have been working on a set of benchmarks for Polaris [1].  I have

run

them against the current main branch (Eclipselink+Postgresql)
implementation as well as the NoSQL persistence layer implementation

[2].

The complete report for these performance tests is available at this
address:

https://docs.google.com/document/d/1RLYaAtNUkgNW3Ef7-BWfF_8RkSK7B7oR/edit.

Feel free to review it at your convenience.

The benchmarks demonstrate that the new Persistence implementation

offers:

    - Comparable or better performance for sequential operations
    - Significantly better reliability under concurrent load
    - Consistent read performance even under high-concurrency

scenarios

    - Some challenges with write operations under high concurrent

writes

    conditions (under investigation)

These results suggest that the NoSQL persistence layer implementation
provides a robust foundation for scaling Polaris, particularly for
workloads dominated by high concurrency.

I will soon open a separate PR to contribute these benchmarks to the

main

codebase.

Let me know if you have any question.

Pierre

[1]

https://github.com/pingtimeout/polaris/tree/persistence-benchmarks/benchmarks

[2] https://github.com/apache/polaris/pull/1189

--

Pierre Laporte
@pingtimeout <https://twitter.com/pingtimeout>
pie...@pingtimeout.fr
http://www.pingtimeout.fr/


On Mon, Mar 17, 2025 at 3:46 PM Jean-Baptiste Onofré <

j...@nanthrax.net>

wrote:

Hi Robert,

Thanks for the update and the draft PR !

I would like to use this thread to thank Dennis. Big kudos to Dennis
for the changes he made: without these changes, it would have been
impossible to add new backends like MongoDB.

I propose we review and comment on Robert's PR.

I would also like to propose a community meeting to discuss the
Persistence Improvement and drive consensus.
What about Tuesday, March 25th at 9:30am PST ?

Thanks all !

Regards
JB

On Mon, Mar 17, 2025 at 2:43 PM Robert Stupp <sn...@snazy.de>

wrote:

Hi,

I’ve made quite some progress on building the integration for

NoSQL

databases. The initial code supports MongoDB [A], but is not

limited

to

that database. A working implementation has been pushed as a

draft-PR

[1] for illustration purposes how it can look like when it is

fully

integrated. A couple of smaller PRs will follow.

Background: The only common denominator for "synchronization

purposes”

that all NoSQL databases support is a single-row compare-and-swap

(CAS)

operation - think of this as (pseudo-SQL) “UPDATE table SET x =
:new_value WHERE primary_key = :primary_key AND x =

:expected_old_value”.

The most important objective for the implementation is

correctness,

especially in scenarios with high concurrent load. Explicit tests

to

verify the correctness are included, for the CI “use case” and for
manual/special runs against a clustered database setup (which are

just

“too much” for the Github hosted runners).

The current integration point is
‘MetaStoreManagerFactory’/’PolarisMetaStoreManager’ implemented in

the

“bridge” Gradle project.

The ‘components/persistence/README.md’ in the draft-PR contains

more

technical information.

A benchmarking tool to measure performance and correctness of

Polaris

will be proposed soon as a separate/independent effort. We have

used

this benchmarking tool to measure performance and implicitly the
correctness of the implementation.

Implementations for particular (No)SQL databases are isolated in

one

(Gradle) project per database. This is effectively/conceptually

the

same

approach that already works for Nessie, which supports quite some
databases [2].

Robert

[1] https://github.com/apache/polaris/pull/1189
[2]

https://projectnessie.org/nessie-latest/configuration/#support-for-the-database-specific-implementations

[A] Technically there is also an “in memory” implementation for

testing

purposes (not intended to replace the existing one).


--
Robert Stupp
@snazy

--
Robert Stupp
@snazy

Re: NoSQL database agnostic persistence

Reply via email to