I think we should start a new thread just to gauge consensus on whether
Scala will be allowed in the tools repository or not. To go through my
quick thoughts here.

I like Scala but I have to be realistic in saying that it is a rather
esoteric language choice and limits the number of community members that
can contribute. So it would be a hard -1 for it being included in the main
repository.

Now for the tools repository I would also be a -1 for brand new proposals
without code. Scala raises the bar for contributing so it still wouldn't be
a great thing to add when other language bindings exist that are much more
popular (even if we didn't chose Java)

The current situation is a little different as we already have code written
and I am usually focused on immediate practical benefits over hypothetical
problems. So in the current situation I'm more of a -.1.  The reason I am
still negative is that inclusion of the benchmarks into the project isn't
just about utility to the project, but about whether the community should
take up responsibility for maintaining the code. What is important here is
not whether the code can be used by the project and contributors but about
whether we have enough contributors who are familiar with Scala that the
benchmarks can be maintained. We don't want to be in a situation where you
win the lottery and we are left high and dry :)

The value of the code is clearly high, but whether or not it is reasonable
for the community to take on responsibility for Scala code (and build)
needs to be polled. As long as a significant fraction of contributors don't
have a problem working on Scala code I'm a +1.

If this contribution was in Java or Python I would be +1 without
reservation.


On Sat, Mar 22, 2025 at 12:06 PM Pierre Laporte <pie...@pingtimeout.fr>
wrote:

> I don't mind contributing the benchmarks to `polaris-tools`.  It seems that
> the consensus is clearly in that direction.
>
> I want to address some comments that were made in the PR but that are not
> really related to code review per se.
>
> > You can write gatling benchmarks in a language other than Scala.
> >
> > There are also frameworks other than gatling.
>
> To me, the big question is : Assuming the code goes to `polaris-tools`,
> _will this contribution be rejected if it uses Scala?_
>
> I understand that this is a controversial topic, and how that the expected
> maintenance cost is a key factor here.  I made sure that the code is
> documented and that a comprehensive readme file describes how datasets
> work.  That way, nobody needs to be a Scala developer to leverage or
> understand the tool.
>
> Those benchmarks have already been used to detect, reproduce and fix
> multiple issues in the codebase.  Issues that had not been caught before
> [1] [2] [3].  This shows that the benchmarks already bring value to the
> community in their current state.
>
> Now, I want to avoid any misunderstanding.  My current focus is on evolving
> the benchmarks and covering new cases.  Not on completely rewriting the
> code in Java/another framework.  Essentially: focus on the area that brings
> the most value to Polaris users.
>
> Hence my asking on dev@.  If anything, there will be more Scala code
> pushed
> to the benchmarks branch in the upcoming weeks.  Not less.  I would
> completely understand if the Gatling/Scala design choice is a reason for
> rejection.  The discussion simply needs to happen.
>
> [1] https://github.com/apache/polaris/issues/1044
> [2] https://github.com/apache/polaris/issues/1076
> [3] https://github.com/apache/polaris/issues/1123
>
>
> --
>
> Pierre
>
>
> On Sat, Mar 22, 2025 at 3:47 PM Russell Spitzer <russell.spit...@gmail.com
> >
> wrote:
>
> > I think it makes sense for us to also build some capabilities into the
> > tools repo to build Polaris at a specific commit for testing purposes. If
> > the Spark Catalog and Benchmarking code goes there they could both share
> > this code for testing, ditto for the migration code.
> >
> > On Fri, Mar 21, 2025 at 4:59 PM Yufei Gu <flyrain...@gmail.com> wrote:
> >
> > > I’m leaning toward placing it in a separate repository rather than in
> > > https://github.com/apache/polaris. The benchmark tool is largely
> > > self-contained and doesn’t have a strong dependency on the main
> codebase.
> > >
> > > IIUC, the only requirement is a running Polaris instance, which the
> tool
> > > can connect to using the following configuration:
> > > export CLIENT_ID=your_client_id
> > > export CLIENT_SECRET=your_client_secret
> > > export BASE_URL=http://your-polaris-instance:8181
> > >
> > > Yufei
> > >
> > >
> > > On Thu, Mar 20, 2025 at 6:05 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> > > wrote:
> > >
> > > > Hi Ajantha,
> > > >
> > > > That's a good request.
> > > >
> > > > Imho, right now, before distributing any artifact (either on nightly
> > > > build space https://nightlies.apache.org/), I prefer to have it
> "good
> > > > enough" from a "legal" standpoint (e.g. LICENSE/NOTICE).
> > > >
> > > > I'm almost done about that for all artifacts (jar and distributions).
> > > > I will open a PR soon.
> > > > Once this PR is done, I will submit a way to provide nightly builds.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > > On Thu, Mar 20, 2025 at 10:27 AM Ajantha Bhat <ajanthab...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > I cannot think of any issue with storing that code in the
> > > polaris-tools
> > > > > repository.
> > > > >
> > > > > While contributing the `catalog migrator tool` to `polaris-tools`,
> I
> > > > > encountered a challenge because this external repository needs to
> > > depend
> > > > on
> > > > > Apache Polaris jars, which haven't been published yet by Apache
> > > Polaris.
> > > > If
> > > > > we keep the tool in polaris-tools, we may need to wait for the
> > nightly
> > > > > build or official jar publication.
> > > > >
> > > > > - Ajantha
> > > > >
> > > > > On Thu, Mar 20, 2025 at 2:46 PM Pierre Laporte <
> > pie...@pingtimeout.fr>
> > > > > wrote:
> > > > >
> > > > > > On Wed, Mar 19, 2025 at 4:53 PM Jean-Baptiste Onofré <
> > > j...@nanthrax.net>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Pierre
> > > > > > >
> > > > > > > Thanks !
> > > > > > >
> > > > > > > I have a general comment: do we want the benchmark tool as part
> > of
> > > > > > > Polaris "core" repo or on polaris-tools ?
> > > > > > > As we can consider this as a benchmark "tool", maybe it makes
> > sense
> > > > to
> > > > > > > host it in https://github.com/apache/polaris-tools.
> > > > > > >
> > > > > > >
> > > > > > At this point, apart from the Gradle build files, the benchmark
> > code
> > > is
> > > > > > completely contained under the benchmarks/ directory.  And given
> it
> > > > relies
> > > > > > on the REST API, there is no real dependency to any specific
> > Polaris
> > > > > > version.
> > > > > >
> > > > > > I cannot think of any issue with storing that code in the
> > > polaris-tools
> > > > > > repository.
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Pierre
> > > > > >
> > > >
> > >
> >
>

Reply via email to