Re: Polaris benchmarks proposal

Jean-Baptiste Onofré Wed, 19 Mar 2025 11:22:15 -0700

Hey,

Yes, we have precedent about sponsored "machines/executors".
For instance, at Apache Beam, we had (and still have) sponsored
Jenkins executors (there are some requirements from the ASF Infra, but
possible).


Regards
JB

On Wed, Mar 19, 2025 at 5:23 PM Robert Stupp <[email protected]> wrote:
>
> Having benchmark results against individual commits is a great thing to
> have.
>
> The small GH hosted runners however are not suitable for
> deterministic/comparable results.
>
> It would be possible though, if the hardware (or bare-metal compute
> instances in the cloud) is available to the project. I suspect there's
> nothing that would block the project from getting this sponsored, am I
> right JB?
>
>
> On 19.03.25 17:09, Prashant Singh wrote:
> > Thank you so much for the benchmarks !
> > +1, having benchmark results committed, it will help catch any degradation
> > / correctness issue that can creep in !
> > equivalent to golden files of tpc-ds / tpc-h in spark repo.
> >
> > Best,
> > Prashant Sungh
> >
> > On Wed, Mar 19, 2025 at 8:53 AM Russell Spitzer <[email protected]>
> > wrote:
> >
> >> I think having a tool like this is a great idea. Would we be able to host
> >> the results over time as well? Like an official build run that triggers on
> >> a daily basis?
> >>
> >> On Wed, Mar 19, 2025 at 10:07 AM Pierre Laporte <[email protected]>
> >> wrote:
> >>
> >>> Hi
> >>>
> >>> I have been working on a set of benchmarks for Polaris [1] and would like
> >>> to contribute them to the project.  I have opened a PR with the code, in
> >>> case anybody is interested.
> >>>
> >>> The benchmarks are written using Gatling.  The core design decision
> >>> consists in building a procedural dataset, loading it to Polaris, and
> >> then
> >>> reusing it for all subsequent benchmarks.  The procedural aspect makes it
> >>> possible to deterministically regenerate the same dataset at runtime over
> >>> and over, without having to store the actual data.
> >>>
> >>> With this, it is trivial to generate large number of Polaris entities.
> >>> Typically, I used this to benchmark the NoSQL persistence implementation
> >>> with 65k namespaces, 65k tables and 65k views.  Increasing that to
> >> millions
> >>> would only require a one parameter change.  Additionally, the dataset
> >>> currently includes property updates for namespaces, tables and views,
> >> which
> >>> can quickly create hundreds of manifests.  This can be useful for table
> >>> maintenance testing.
> >>>
> >>> Three benchmarks have been created so far:
> >>>
> >>>     - A benchmark that populates an empty Polaris server with a dataset
> >> that
> >>>     have predefined attributes
> >>>     - A benchmark that issues only read queries over that dataset
> >>>     - A benchmark that issues read and write queries (entity updates) over
> >>>     that dataset, with a configurable read/write ratio
> >>>
> >>> The benchmarks/README.md contains instructions to build and run the
> >>> benchmarks, as well as to describe the kind of dataset that should be
> >>> generated.
> >>>
> >>> As with every Gatling benchmark, an HTML page is generated with
> >> interactive
> >>> charts showing query performance over time, response time percentiles,
> >>> etc...
> >>>
> >>> I would love to head your feedback on it.
> >>>
> >>> Pierre
> >>>
> >>> [1] https://github.com/apache/polaris/pull/1208
> >>> --
> >>>
> >>> Pierre Laporte
> >>> @pingtimeout <https://twitter.com/pingtimeout>
> >>> [email protected]
> >>> http://www.pingtimeout.fr/
> >>>
> --
> Robert Stupp
> @snazy
>

Re: Polaris benchmarks proposal

Reply via email to