Hi Pierre, Thanks !
I will take a look at the new PR :) Regards JB On Tue, Apr 1, 2025 at 5:38 PM Pierre Laporte <pie...@pingtimeout.fr> wrote: > > Ok so it seems there is a consensus. The benchmarks can be written in > Scala as long as they are contributed to the tools repository. I just > closed the initial PR that was against the `apache/polaris` repository and > opened a new one against the `apache/polaris-tools` repository ( > https://github.com/apache/polaris-tools/pull/2). > > Thanks for your feedback > > -- > > Pierre > > > On Mon, Mar 24, 2025 at 7:05 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Hi Eric > > > > That's a good point. I think that it's something we can manage with > > each tool in a separate folder/module. And, I'm sure we will find a > > solution if/when the problem will occur :) > > > > Regards > > JB > > > > On Mon, Mar 24, 2025 at 5:51 PM Eric Maynard <eric.w.mayn...@gmail.com> > > wrote: > > > > > > +1 to what JB said. > > > > > > My concern with Scala has mostly been that it can alienate new > > contributors > > > and add ambiguity about when we should use Scala vs. Java. If we’re > > putting > > > this in polaris-tools for now and the philosophy for polaris-tools is to > > > more or less use whatever language you prefer, there should be no issues. > > > > > > It does make me think that we should more or less isolate each other > > “tool” > > > though. What if contributor A wants a different version of a language or > > > dependence compared to contributor B? But that’s something we can figure > > > out as we go. > > > > > > On Mon, Mar 24, 2025 at 1:46 AM Jean-Baptiste Onofré <j...@nanthrax.net> > > > wrote: > > > > > > > Hi, > > > > > > > > Personally, I'm more in favor of hosting the benchmark tool in > > > > polaris-tools (it looks logical :)). > > > > > > > > Now, about Scala, and generally speaking about "maintenance > > > > questions", I think we should not consider what we (individuals) can > > > > or want to maintain, but more, what the community (including all > > > > contributors) can/would like to maintain. > > > > If we take an analogy with Apache Iceberg, Apache Arrow or Apache > > > > Beam, we can see python, rust, go, maintained by the community, > > > > whereas it was not probably not the main "skill" from the first > > > > committers. > > > > > > > > So, I don't consider Scala as a question. I also am more in favor of > > > > moving forward, adding scala support on polaris-tools repo. In the > > > > lifetime of a project, things can change and refactoring happens, so > > > > we will always be able to replace Scala or find alternative (to the > > > > benchmark tool) if there's an ask from the community. > > > > > > > > My $0.10 :) > > > > > > > > Regards > > > > JB > > > > > > > > On Sun, Mar 23, 2025 at 4:42 PM Michael Collado < > > collado.m...@gmail.com> > > > > wrote: > > > > > > > > > > Personally, I don’t mind if have to maintain a bit of Scala code - I > > like > > > > > Scala, though every time the question of using comes up, I see the > > same > > > > > concerns that Russell brought up. > > > > > > > > > > I will say that if the alternative is to introduce JMeter into the > > repo, > > > > > I’m a hard -1. I’ll write Scala all day long to avoid that. > > > > > > > > > > Mike > > > > > > > > > > On Sat, Mar 22, 2025 at 1:13 PM Russell Spitzer < > > > > russell.spit...@gmail.com> > > > > > wrote: > > > > > > > > > > > I think we should start a new thread just to gauge consensus on > > whether > > > > > > Scala will be allowed in the tools repository or not. To go > > through my > > > > > > quick thoughts here. > > > > > > > > > > > > I like Scala but I have to be realistic in saying that it is a > > rather > > > > > > esoteric language choice and limits the number of community members > > > > that > > > > > > can contribute. So it would be a hard -1 for it being included in > > the > > > > main > > > > > > repository. > > > > > > > > > > > > Now for the tools repository I would also be a -1 for brand new > > > > proposals > > > > > > without code. Scala raises the bar for contributing so it still > > > > wouldn't be > > > > > > a great thing to add when other language bindings exist that are > > much > > > > more > > > > > > popular (even if we didn't chose Java) > > > > > > > > > > > > The current situation is a little different as we already have code > > > > written > > > > > > and I am usually focused on immediate practical benefits over > > > > hypothetical > > > > > > problems. So in the current situation I'm more of a -.1. The > > reason I > > > > am > > > > > > still negative is that inclusion of the benchmarks into the project > > > > isn't > > > > > > just about utility to the project, but about whether the community > > > > should > > > > > > take up responsibility for maintaining the code. What is important > > > > here is > > > > > > not whether the code can be used by the project and contributors > > but > > > > about > > > > > > whether we have enough contributors who are familiar with Scala > > that > > > > the > > > > > > benchmarks can be maintained. We don't want to be in a situation > > where > > > > you > > > > > > win the lottery and we are left high and dry :) > > > > > > > > > > > > The value of the code is clearly high, but whether or not it is > > > > reasonable > > > > > > for the community to take on responsibility for Scala code (and > > build) > > > > > > needs to be polled. As long as a significant fraction of > > contributors > > > > don't > > > > > > have a problem working on Scala code I'm a +1. > > > > > > > > > > > > If this contribution was in Java or Python I would be +1 without > > > > > > reservation. > > > > > > > > > > > > > > > > > > On Sat, Mar 22, 2025 at 12:06 PM Pierre Laporte < > > pie...@pingtimeout.fr > > > > > > > > > > > wrote: > > > > > > > > > > > > > I don't mind contributing the benchmarks to `polaris-tools`. It > > > > seems > > > > > > that > > > > > > > the consensus is clearly in that direction. > > > > > > > > > > > > > > I want to address some comments that were made in the PR but that > > > > are not > > > > > > > really related to code review per se. > > > > > > > > > > > > > > > You can write gatling benchmarks in a language other than > > Scala. > > > > > > > > > > > > > > > > There are also frameworks other than gatling. > > > > > > > > > > > > > > To me, the big question is : Assuming the code goes to > > > > `polaris-tools`, > > > > > > > _will this contribution be rejected if it uses Scala?_ > > > > > > > > > > > > > > I understand that this is a controversial topic, and how that the > > > > > > expected > > > > > > > maintenance cost is a key factor here. I made sure that the > > code is > > > > > > > documented and that a comprehensive readme file describes how > > > > datasets > > > > > > > work. That way, nobody needs to be a Scala developer to > > leverage or > > > > > > > understand the tool. > > > > > > > > > > > > > > Those benchmarks have already been used to detect, reproduce and > > fix > > > > > > > multiple issues in the codebase. Issues that had not been caught > > > > before > > > > > > > [1] [2] [3]. This shows that the benchmarks already bring value > > to > > > > the > > > > > > > community in their current state. > > > > > > > > > > > > > > Now, I want to avoid any misunderstanding. My current focus is > > on > > > > > > evolving > > > > > > > the benchmarks and covering new cases. Not on completely > > rewriting > > > > the > > > > > > > code in Java/another framework. Essentially: focus on the area > > that > > > > > > brings > > > > > > > the most value to Polaris users. > > > > > > > > > > > > > > Hence my asking on dev@. If anything, there will be more Scala > > code > > > > > > > pushed > > > > > > > to the benchmarks branch in the upcoming weeks. Not less. I > > would > > > > > > > completely understand if the Gatling/Scala design choice is a > > reason > > > > for > > > > > > > rejection. The discussion simply needs to happen. > > > > > > > > > > > > > > [1] https://github.com/apache/polaris/issues/1044 > > > > > > > [2] https://github.com/apache/polaris/issues/1076 > > > > > > > [3] https://github.com/apache/polaris/issues/1123 > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Pierre > > > > > > > > > > > > > > > > > > > > > On Sat, Mar 22, 2025 at 3:47 PM Russell Spitzer < > > > > > > russell.spit...@gmail.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > I think it makes sense for us to also build some capabilities > > into > > > > the > > > > > > > > tools repo to build Polaris at a specific commit for testing > > > > purposes. > > > > > > If > > > > > > > > the Spark Catalog and Benchmarking code goes there they could > > both > > > > > > share > > > > > > > > this code for testing, ditto for the migration code. > > > > > > > > > > > > > > > > On Fri, Mar 21, 2025 at 4:59 PM Yufei Gu <flyrain...@gmail.com > > > > > > > wrote: > > > > > > > > > > > > > > > > > I’m leaning toward placing it in a separate repository rather > > > > than in > > > > > > > > > https://github.com/apache/polaris. The benchmark tool is > > largely > > > > > > > > > self-contained and doesn’t have a strong dependency on the > > main > > > > > > > codebase. > > > > > > > > > > > > > > > > > > IIUC, the only requirement is a running Polaris instance, > > which > > > > the > > > > > > > tool > > > > > > > > > can connect to using the following configuration: > > > > > > > > > export CLIENT_ID=your_client_id > > > > > > > > > export CLIENT_SECRET=your_client_secret > > > > > > > > > export BASE_URL=http://your-polaris-instance:8181 > > > > > > > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 20, 2025 at 6:05 AM Jean-Baptiste Onofré < > > > > > > j...@nanthrax.net> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Ajantha, > > > > > > > > > > > > > > > > > > > > That's a good request. > > > > > > > > > > > > > > > > > > > > Imho, right now, before distributing any artifact (either > > on > > > > > > nightly > > > > > > > > > > build space https://nightlies.apache.org/), I prefer to > > have > > > > it > > > > > > > "good > > > > > > > > > > enough" from a "legal" standpoint (e.g. LICENSE/NOTICE). > > > > > > > > > > > > > > > > > > > > I'm almost done about that for all artifacts (jar and > > > > > > distributions). > > > > > > > > > > I will open a PR soon. > > > > > > > > > > Once this PR is done, I will submit a way to provide > > nightly > > > > > > builds. > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > JB > > > > > > > > > > > > > > > > > > > > On Thu, Mar 20, 2025 at 10:27 AM Ajantha Bhat < > > > > > > ajanthab...@gmail.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > I cannot think of any issue with storing that code in > > the > > > > > > > > > polaris-tools > > > > > > > > > > > repository. > > > > > > > > > > > > > > > > > > > > > > While contributing the `catalog migrator tool` to > > > > > > `polaris-tools`, > > > > > > > I > > > > > > > > > > > encountered a challenge because this external repository > > > > needs to > > > > > > > > > depend > > > > > > > > > > on > > > > > > > > > > > Apache Polaris jars, which haven't been published yet by > > > > Apache > > > > > > > > > Polaris. > > > > > > > > > > If > > > > > > > > > > > we keep the tool in polaris-tools, we may need to wait > > for > > > > the > > > > > > > > nightly > > > > > > > > > > > build or official jar publication. > > > > > > > > > > > > > > > > > > > > > > - Ajantha > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 20, 2025 at 2:46 PM Pierre Laporte < > > > > > > > > pie...@pingtimeout.fr> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 19, 2025 at 4:53 PM Jean-Baptiste Onofré < > > > > > > > > > j...@nanthrax.net> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi Pierre > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks ! > > > > > > > > > > > > > > > > > > > > > > > > > > I have a general comment: do we want the benchmark > > tool > > > > as > > > > > > part > > > > > > > > of > > > > > > > > > > > > > Polaris "core" repo or on polaris-tools ? > > > > > > > > > > > > > As we can consider this as a benchmark "tool", maybe > > it > > > > makes > > > > > > > > sense > > > > > > > > > > to > > > > > > > > > > > > > host it in https://github.com/apache/polaris-tools. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At this point, apart from the Gradle build files, the > > > > benchmark > > > > > > > > code > > > > > > > > > is > > > > > > > > > > > > completely contained under the benchmarks/ directory. > > And > > > > > > given > > > > > > > it > > > > > > > > > > relies > > > > > > > > > > > > on the REST API, there is no real dependency to any > > > > specific > > > > > > > > Polaris > > > > > > > > > > > > version. > > > > > > > > > > > > > > > > > > > > > > > > I cannot think of any issue with storing that code in > > the > > > > > > > > > polaris-tools > > > > > > > > > > > > repository. > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > Pierre > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >