Re: Polaris benchmarks proposal

2025-04-04 Thread Robert Stupp
Having benchmark results against individual commits is a great thing to have. The small GH hosted runners however are not suitable for deterministic/comparable results. It would be possible though, if the hardware (or bare-metal compute instances in the cloud) is available to the project. I

Re: Polaris benchmarks proposal

2025-04-01 Thread Jean-Baptiste Onofré
Hi Pierre, Thanks ! I will take a look at the new PR :) Regards JB On Tue, Apr 1, 2025 at 5:38 PM Pierre Laporte wrote: > > Ok so it seems there is a consensus. The benchmarks can be written in > Scala as long as they are contributed to the tools repository. I just > closed the initial PR th

Re: Polaris benchmarks proposal

2025-04-01 Thread Pierre Laporte
Ok so it seems there is a consensus. The benchmarks can be written in Scala as long as they are contributed to the tools repository. I just closed the initial PR that was against the `apache/polaris` repository and opened a new one against the `apache/polaris-tools` repository ( https://github.co

Re: Polaris benchmarks proposal

2025-04-01 Thread Russell Spitzer
Sounds good! On Tue, Apr 1, 2025 at 10:38 AM Pierre Laporte wrote: > Ok so it seems there is a consensus. The benchmarks can be written in > Scala as long as they are contributed to the tools repository. I just > closed the initial PR that was against the `apache/polaris` repository and > open

Re: Polaris benchmarks proposal

2025-03-26 Thread Russell Spitzer
I think having a tool like this is a great idea. Would we be able to host the results over time as well? Like an official build run that triggers on a daily basis? On Wed, Mar 19, 2025 at 10:07 AM Pierre Laporte wrote: > Hi > > I have been working on a set of benchmarks for Polaris [1] and would

Re: Polaris benchmarks proposal

2025-03-25 Thread Jean-Baptiste Onofré
Hi Eric That's a good point. I think that it's something we can manage with each tool in a separate folder/module. And, I'm sure we will find a solution if/when the problem will occur :) Regards JB On Mon, Mar 24, 2025 at 5:51 PM Eric Maynard wrote: > > +1 to what JB said. > > My concern with S

Re: Polaris benchmarks proposal

2025-03-24 Thread Eric Maynard
+1 to what JB said. My concern with Scala has mostly been that it can alienate new contributors and add ambiguity about when we should use Scala vs. Java. If we’re putting this in polaris-tools for now and the philosophy for polaris-tools is to more or less use whatever language you prefer, there

Re: Polaris benchmarks proposal

2025-03-24 Thread Jean-Baptiste Onofré
Hi, Personally, I'm more in favor of hosting the benchmark tool in polaris-tools (it looks logical :)). Now, about Scala, and generally speaking about "maintenance questions", I think we should not consider what we (individuals) can or want to maintain, but more, what the community (including all

Re: Polaris benchmarks proposal

2025-03-23 Thread Michael Collado
Personally, I don’t mind if have to maintain a bit of Scala code - I like Scala, though every time the question of using comes up, I see the same concerns that Russell brought up. I will say that if the alternative is to introduce JMeter into the repo, I’m a hard -1. I’ll write Scala all day long

Re: Polaris benchmarks proposal

2025-03-22 Thread Yufei Gu
I’m leaning toward placing it in a separate repository rather than in https://github.com/apache/polaris. The benchmark tool is largely self-contained and doesn’t have a strong dependency on the main codebase. IIUC, the only requirement is a running Polaris instance, which the tool can connect to u

Re: Polaris benchmarks proposal

2025-03-22 Thread Russell Spitzer
I think we should start a new thread just to gauge consensus on whether Scala will be allowed in the tools repository or not. To go through my quick thoughts here. I like Scala but I have to be realistic in saying that it is a rather esoteric language choice and limits the number of community memb

Re: Polaris benchmarks proposal

2025-03-22 Thread Pierre Laporte
I don't mind contributing the benchmarks to `polaris-tools`. It seems that the consensus is clearly in that direction. I want to address some comments that were made in the PR but that are not really related to code review per se. > You can write gatling benchmarks in a language other than Scala

Re: Polaris benchmarks proposal

2025-03-22 Thread Russell Spitzer
I think it makes sense for us to also build some capabilities into the tools repo to build Polaris at a specific commit for testing purposes. If the Spark Catalog and Benchmarking code goes there they could both share this code for testing, ditto for the migration code. On Fri, Mar 21, 2025 at 4:5

Re: Polaris benchmarks proposal

2025-03-20 Thread Jean-Baptiste Onofré
Hi Ajantha, That's a good request. Imho, right now, before distributing any artifact (either on nightly build space https://nightlies.apache.org/), I prefer to have it "good enough" from a "legal" standpoint (e.g. LICENSE/NOTICE). I'm almost done about that for all artifacts (jar and distributio

Re: Polaris benchmarks proposal

2025-03-20 Thread Ajantha Bhat
> I cannot think of any issue with storing that code in the polaris-tools repository. While contributing the `catalog migrator tool` to `polaris-tools`, I encountered a challenge because this external repository needs to depend on Apache Polaris jars, which haven't been published yet by Apache Pol

Re: Polaris benchmarks proposal

2025-03-20 Thread Pierre Laporte
On Wed, Mar 19, 2025 at 4:53 PM Jean-Baptiste Onofré wrote: > Hi Pierre > > Thanks ! > > I have a general comment: do we want the benchmark tool as part of > Polaris "core" repo or on polaris-tools ? > As we can consider this as a benchmark "tool", maybe it makes sense to > host it in https://git

Re: Polaris benchmarks proposal

2025-03-19 Thread Jean-Baptiste Onofré
Hey, Yes, we have precedent about sponsored "machines/executors". For instance, at Apache Beam, we had (and still have) sponsored Jenkins executors (there are some requirements from the ASF Infra, but possible). Regards JB On Wed, Mar 19, 2025 at 5:23 PM Robert Stupp wrote: > > Having benchmark

Re: Polaris benchmarks proposal

2025-03-19 Thread Yufei Gu
Thanks Pieree! It's great to have a benchmark tool to measure performance. It'd be awesome to make decisions based on numbers instead of theories. Yufei On Wed, Mar 19, 2025 at 8:53 AM Jean-Baptiste Onofré wrote: > Hi Pierre > > Thanks ! > > I have a general comment: do we want the benchmark

Re: Polaris benchmarks proposal

2025-03-19 Thread Prashant Singh
Thank you so much for the benchmarks ! +1, having benchmark results committed, it will help catch any degradation / correctness issue that can creep in ! equivalent to golden files of tpc-ds / tpc-h in spark repo. Best, Prashant Sungh On Wed, Mar 19, 2025 at 8:53 AM Russell Spitzer wrote: > I t

Re: Polaris benchmarks proposal

2025-03-19 Thread Jean-Baptiste Onofré
Hi Pierre Thanks ! I have a general comment: do we want the benchmark tool as part of Polaris "core" repo or on polaris-tools ? As we can consider this as a benchmark "tool", maybe it makes sense to host it in https://github.com/apache/polaris-tools. Thoughts ? Regards JB On Wed, Mar 19, 2025