Thanks Pieree! It's great to have a benchmark tool to measure performance. It'd be awesome to make decisions based on numbers instead of theories.
Yufei On Wed, Mar 19, 2025 at 8:53 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Pierre > > Thanks ! > > I have a general comment: do we want the benchmark tool as part of > Polaris "core" repo or on polaris-tools ? > As we can consider this as a benchmark "tool", maybe it makes sense to > host it in https://github.com/apache/polaris-tools. > > Thoughts ? > > Regards > JB > > On Wed, Mar 19, 2025 at 4:06 PM Pierre Laporte <pie...@pingtimeout.fr> > wrote: > > > > Hi > > > > I have been working on a set of benchmarks for Polaris [1] and would like > > to contribute them to the project. I have opened a PR with the code, in > > case anybody is interested. > > > > The benchmarks are written using Gatling. The core design decision > > consists in building a procedural dataset, loading it to Polaris, and > then > > reusing it for all subsequent benchmarks. The procedural aspect makes it > > possible to deterministically regenerate the same dataset at runtime over > > and over, without having to store the actual data. > > > > With this, it is trivial to generate large number of Polaris entities. > > Typically, I used this to benchmark the NoSQL persistence implementation > > with 65k namespaces, 65k tables and 65k views. Increasing that to > millions > > would only require a one parameter change. Additionally, the dataset > > currently includes property updates for namespaces, tables and views, > which > > can quickly create hundreds of manifests. This can be useful for table > > maintenance testing. > > > > Three benchmarks have been created so far: > > > > - A benchmark that populates an empty Polaris server with a dataset > that > > have predefined attributes > > - A benchmark that issues only read queries over that dataset > > - A benchmark that issues read and write queries (entity updates) over > > that dataset, with a configurable read/write ratio > > > > The benchmarks/README.md contains instructions to build and run the > > benchmarks, as well as to describe the kind of dataset that should be > > generated. > > > > As with every Gatling benchmark, an HTML page is generated with > interactive > > charts showing query performance over time, response time percentiles, > > etc... > > > > I would love to head your feedback on it. > > > > Pierre > > > > [1] https://github.com/apache/polaris/pull/1208 > > -- > > > > Pierre Laporte > > @pingtimeout <https://twitter.com/pingtimeout> > > pie...@pingtimeout.fr > > http://www.pingtimeout.fr/ >