IMO we should test the s..t out of the implementation to ensure that
it's stable and reliable and correct.
Once that's done, we can certainly have some "realistic reference
benchmark" with defined, different and concurrent user stories. That's
definitely possible with Gatling.
On 20.03.25 14:40, Russell Spitzer wrote:
We could just keep the exact same total number of entities, so 65 catalogs,
10 namespaces each with 100 Tables and Views. Or you can scale it up if you
want. The insights I'm particularly interested in are the dynamics of
listing and accessing tables in a namespace with multiple entries. The
areas I want to test for generic implementations are things like,
What does performance look like when listing entries within a namespace
Does "create" performance chance if there are other tables in the same
namespace
Is concurrency within a namespace the same as concurrency between
namespaces?
I'm just a little worried about us optimizing in the future for a synthetic
scenario. I think we are all in agreement that it's unlikely for a Polaris
use case to actually have a 1 to 1 ratio correct so I would just want to
make sure we are benchmarking against something that looks a little closer
to what a real layout would look like.
On Thu, Mar 20, 2025 at 3:26 AM Pierre Laporte <pie...@pingtimeout.fr>
wrote:
I think what you are referring to is the "number of tables per namespace"
property. See the binary tree example
<
https://github.com/pingtimeout/polaris/tree/persistence-benchmarks/benchmarks#binary-tree-example
in the docs where, after a binary tree of namespaces, 5 tables are created
in each namespace. So yes, a scenario with 100 tables per namespace is
definitely possible.
Now, could we run those as well as part of this effort, I would say that it
depends. What are the specific insights we are trying to get out of that
new scenario?
I believe the benchmark would need a reasonable number of namespaces as
well. So an additional question is: what would the namespaces tree look
like in terms of width and height? That could easily multiply to a high
number of entities, which in turn would mean that *only* the new
persistence implementation + MongoDB can be used. I am fine with that, as
it is clear to me that the new persistence layer is the way to go. Does
everybody agree with that statement?
--
Pierre
On Wed, Mar 19, 2025 at 9:33 PM Russell Spitzer <russell.spit...@gmail.com
wrote:
I think I saw in the other document you had some benchmarks with a less
1N
to 1T ratio? Could we run some of those as well? It would be great to
have
something with closer to a 1 Namspace to 100 tables sort of layout.
On Wed, Mar 19, 2025 at 3:06 PM Pierre Laporte <pie...@pingtimeout.fr>
wrote:
Just a heads up, I updated the report with the latest results from the
persistence work, as well as the tarball with raw results.
--
Pierre Laporte
@pingtimeout <https://twitter.com/pingtimeout>
pie...@pingtimeout.fr
On Wed, Mar 19, 2025 at 3:20 PM Pierre Laporte <pie...@pingtimeout.fr>
wrote:
Hi,
I have been working on a set of benchmarks for Polaris [1]. I have
run
them against the current main branch (Eclipselink+Postgresql)
implementation as well as the NoSQL persistence layer implementation
[2].
The complete report for these performance tests is available at this
address:
https://docs.google.com/document/d/1RLYaAtNUkgNW3Ef7-BWfF_8RkSK7B7oR/edit.
Feel free to review it at your convenience.
The benchmarks demonstrate that the new Persistence implementation
offers:
- Comparable or better performance for sequential operations
- Significantly better reliability under concurrent load
- Consistent read performance even under high-concurrency
scenarios
- Some challenges with write operations under high concurrent
writes
conditions (under investigation)
These results suggest that the NoSQL persistence layer implementation
provides a robust foundation for scaling Polaris, particularly for
workloads dominated by high concurrency.
I will soon open a separate PR to contribute these benchmarks to the
main
codebase.
Let me know if you have any question.
Pierre
[1]
https://github.com/pingtimeout/polaris/tree/persistence-benchmarks/benchmarks
[2] https://github.com/apache/polaris/pull/1189
--
Pierre Laporte
@pingtimeout <https://twitter.com/pingtimeout>
pie...@pingtimeout.fr
http://www.pingtimeout.fr/
On Mon, Mar 17, 2025 at 3:46 PM Jean-Baptiste Onofré <
j...@nanthrax.net>
wrote:
Hi Robert,
Thanks for the update and the draft PR !
I would like to use this thread to thank Dennis. Big kudos to Dennis
for the changes he made: without these changes, it would have been
impossible to add new backends like MongoDB.
I propose we review and comment on Robert's PR.
I would also like to propose a community meeting to discuss the
Persistence Improvement and drive consensus.
What about Tuesday, March 25th at 9:30am PST ?
Thanks all !
Regards
JB
On Mon, Mar 17, 2025 at 2:43 PM Robert Stupp <sn...@snazy.de>
wrote:
Hi,
I’ve made quite some progress on building the integration for
NoSQL
databases. The initial code supports MongoDB [A], but is not
limited
to
that database. A working implementation has been pushed as a
draft-PR
[1] for illustration purposes how it can look like when it is
fully
integrated. A couple of smaller PRs will follow.
Background: The only common denominator for "synchronization
purposes”
that all NoSQL databases support is a single-row compare-and-swap
(CAS)
operation - think of this as (pseudo-SQL) “UPDATE table SET x =
:new_value WHERE primary_key = :primary_key AND x =
:expected_old_value”.
The most important objective for the implementation is
correctness,
especially in scenarios with high concurrent load. Explicit tests
to
verify the correctness are included, for the CI “use case” and for
manual/special runs against a clustered database setup (which are
just
“too much” for the Github hosted runners).
The current integration point is
‘MetaStoreManagerFactory’/’PolarisMetaStoreManager’ implemented in
the
“bridge” Gradle project.
The ‘components/persistence/README.md’ in the draft-PR contains
more
technical information.
A benchmarking tool to measure performance and correctness of
Polaris
will be proposed soon as a separate/independent effort. We have
used
this benchmarking tool to measure performance and implicitly the
correctness of the implementation.
Implementations for particular (No)SQL databases are isolated in
one
(Gradle) project per database. This is effectively/conceptually
the
same
approach that already works for Nessie, which supports quite some
databases [2].
Robert
[1] https://github.com/apache/polaris/pull/1189
[2]
https://projectnessie.org/nessie-latest/configuration/#support-for-the-database-specific-implementations
[A] Technically there is also an “in memory” implementation for
testing
purposes (not intended to replace the existing one).
--
Robert Stupp
@snazy
--
Robert Stupp
@snazy