Re: Polaris benchmarks proposal

Yufei Gu Wed, 19 Mar 2025 11:02:17 -0700

Thanks Pieree!

It's great to have a benchmark tool to measure performance. It'd be awesome
to make decisions based on numbers instead of theories.


Yufei


On Wed, Mar 19, 2025 at 8:53 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Pierre
>
> Thanks !
>
> I have a general comment: do we want the benchmark tool as part of
> Polaris "core" repo or on polaris-tools ?
> As we can consider this as a benchmark "tool", maybe it makes sense to
> host it in https://github.com/apache/polaris-tools.
>
> Thoughts ?
>
> Regards
> JB
>
> On Wed, Mar 19, 2025 at 4:06 PM Pierre Laporte <pie...@pingtimeout.fr>
> wrote:
> >
> > Hi
> >
> > I have been working on a set of benchmarks for Polaris [1] and would like
> > to contribute them to the project.  I have opened a PR with the code, in
> > case anybody is interested.
> >
> > The benchmarks are written using Gatling.  The core design decision
> > consists in building a procedural dataset, loading it to Polaris, and
> then
> > reusing it for all subsequent benchmarks.  The procedural aspect makes it
> > possible to deterministically regenerate the same dataset at runtime over
> > and over, without having to store the actual data.
> >
> > With this, it is trivial to generate large number of Polaris entities.
> > Typically, I used this to benchmark the NoSQL persistence implementation
> > with 65k namespaces, 65k tables and 65k views.  Increasing that to
> millions
> > would only require a one parameter change.  Additionally, the dataset
> > currently includes property updates for namespaces, tables and views,
> which
> > can quickly create hundreds of manifests.  This can be useful for table
> > maintenance testing.
> >
> > Three benchmarks have been created so far:
> >
> >    - A benchmark that populates an empty Polaris server with a dataset
> that
> >    have predefined attributes
> >    - A benchmark that issues only read queries over that dataset
> >    - A benchmark that issues read and write queries (entity updates) over
> >    that dataset, with a configurable read/write ratio
> >
> > The benchmarks/README.md contains instructions to build and run the
> > benchmarks, as well as to describe the kind of dataset that should be
> > generated.
> >
> > As with every Gatling benchmark, an HTML page is generated with
> interactive
> > charts showing query performance over time, response time percentiles,
> > etc...
> >
> > I would love to head your feedback on it.
> >
> > Pierre
> >
> > [1] https://github.com/apache/polaris/pull/1208
> > --
> >
> > Pierre Laporte
> > @pingtimeout <https://twitter.com/pingtimeout>
> > pie...@pingtimeout.fr
> > http://www.pingtimeout.fr/
>

Re: Polaris benchmarks proposal

Reply via email to