Re: ASF policy violation and Scala version issues

yangjie01 Sun, 11 Jun 2023 01:22:12 -0700

Perhaps we should reconsider our reliance on and use of Ammonite? There are 
still no new available versions of Ammonite one week after the release of Scala 
2.12.18 and 2.13.11. The question related to version release in the Ammonite 
community also did not receive a response, which makes me feel this is 
unexpected. Of course, we can also wait for a while before making a decision.

```
Scala version upgrade is blocked by the Ammonite library dev cycle currently.

Although we discussed it here and it had good intentions,
the current master branch cannot use the latest Scala.

-
https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk<https://mailshield.baidu.com/check?q=a0CRn0If1fLAaBgzrkizNpbJftqXtEqgcW38yNaIQU0Q%2bmjDPAzVRvE67%2blIinmxUzxEubVP%2fhQb3ZmEtUYFNqDCCXU%3d>
"Ammonite as REPL for Spark Connect"
SPARK-42884 Add Ammonite REPL integration

Specifically, the following are blocked and I'm monitoring the Ammonite
repository.
- SPARK-40497 Upgrade Scala to 2.13.11
- SPARK-43832 Upgrade Scala to 2.12.18
- According to
https://github.com/com-lihaoyi/Ammonite/issues<https://mailshield.baidu.com/check?q=NMT2mSYh9onPK%2fRWv7ZdEPl7eFGwlK%2fKLvFdLs%2f1hex2Mqxu8x5e0CQVe0OwQtVEqqli7w%3d%3d>
,
Scala 3.3.0 LTS support also looks infeasible.

Although we may be able to wait for a while, there are two fundamental
solutions
to unblock this situation in a long-term maintenance perspective.
- Replace it with a Scala-shell based implementation
- Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
Maybe, we can put it into the new repo like Rust and Go client.
```
发件人: Grisha Weintraub <grisha.weintr...@gmail.com>
日期: 2023年6月8日 星期四 04:05
收件人: Dongjoon Hyun <dongjoon.h...@gmail.com>
抄送: Nan Zhu <zhunanmcg...@gmail.com>, Sean Owen <sro...@gmail.com>,
"dev@spark.apache.org" <dev@spark.apache.org>
主题: Re: ASF policy violation and Scala version issues

Dongjoon,

I followed the conversation, and in my opinion, your concern is totally legit.
It just feels that the discussion is focused solely on Databricks, and as I
said above, the same issue occurs in other vendors as well.

On Wed, Jun 7, 2023 at 10:28 PM Dongjoon Hyun
<dongjoon.h...@gmail.com<mailto:dongjoon.h...@gmail.com>> wrote:
To Grisha, we are talking about what is the right way and how to comply with
ASF legal advice which I shared in this thread from "legal-discuss@" mailing
thread.

https://lists.apache.org/thread/mzhggd0rpz8t4d7vdsbhkp38mvd3lty4<https://mailshield.baidu.com/check?q=ZwiIuh1GjzJ832wcY43%2filMC89G28qpX1MwPTnGE7kWNMJuVe0FwSuGJ6LAJTTLxv%2fy5Mv0poHnEa2T7SxQr4gzLc2I%3d>
(legal-discuss@)
https://www.apache.org/foundation/marks/downstream.html#source<https://mailshield.baidu.com/check?q=wtR8UhV2EuUe5pw6boBqY5wTjAhKC8N2YWd1CnMAN3Mi58ZQ5oaSUx92kUzkH%2fwAZRZhN7Rus0A1VMxjHf90qN3oMBY%3d>
(ASF Website)

Dongjoon

On Wed, Jun 7, 2023 at 12:16 PM Grisha Weintraub
<grisha.weintr...@gmail.com<mailto:grisha.weintr...@gmail.com>> wrote:
Yes, in Spark UI you have it as "3.1.2-amazon", but when you create a cluster
it's just Spark 3.1.2.

On Wed, Jun 7, 2023 at 10:05 PM Nan Zhu
<zhunanmcg...@gmail.com<mailto:zhunanmcg...@gmail.com>> wrote:

for EMR, I think they show 3.1.2-amazon in Spark UI, no?

On Wed, Jun 7, 2023 at 11:30 Grisha Weintraub
<grisha.weintr...@gmail.com<mailto:grisha.weintr...@gmail.com>> wrote:
Hi,

I am not taking sides here, but just for fairness, I think it should be noted
that AWS EMR does exactly the same thing.
We choose the EMR version (e.g., 6.4.0) and it has an associated Spark version
(e.g., 3.1.2).
The Spark version here is not the original Apache version but AWS Spark
distribution.

On Wed, Jun 7, 2023 at 8:24 PM Dongjoon Hyun
<dongjoon.h...@gmail.com<mailto:dongjoon.h...@gmail.com>> wrote:
I disagree with you in several ways.

The following is not a *minor* change like the given examples (alterations to
the start-up and shutdown scripts, configuration files, file layout etc.).

> The change you cite meets the 4th point, minor change, made for integration
> reasons.

The following is also wrong. There is no such point of state of Apache Spark
3.4.0 after 3.4.0 tag creation. Apache Spark community didn't allow Scala
reverting patches in both `master` branch and `branch-3.4`.

> There is no known technical objection; this was after all at one point the
> state of Apache Spark.

Is the following your main point? So, you are selling a box "including Harry
Potter by J. K. Rolling whose main character is Barry instead of Harry", but
it's okay because you didn't sell the book itself? And, as a cloud-vendor, you
borrowed the box instead of selling it like private libraries?

> There is no standalone distribution of Apache Spark anywhere here.

We are not asking a big thing. Why are you so reluctant to say you are not
"Apache Spark 3.4.0" by simply saying "Apache Spark 3.4.0-databricks". What is
the marketing reason here?

Dongjoon.

On Wed, Jun 7, 2023 at 9:27 AM Sean Owen
<sro...@gmail.com<mailto:sro...@gmail.com>> wrote:
Hi Dongjoon, I think this conversation is not advancing anymore. I personally
consider the matter closed unless you can find other support or respond with
more specifics. While this perhaps should be on private@, I think it's not
wrong as an instructive discussion on dev@.

I don't believe you've made a clear argument about the problem, or how it
relates specifically to policy. Nevertheless I will show you my logic.

You are asserting that a vendor cannot call a product Apache Spark 3.4.0 if it
omits a patch updating a Scala maintenance version. This difference has no
known impact on usage, as far as I can tell.

Let's see what policy requires:

1/ All source code changes must meet at least one of the acceptable changes
criteria set out below:
- The change has accepted by the relevant Apache project community for
inclusion in a future release. Note that the process used to accept changes and
how that acceptance is documented varies between projects.
- A change is a fix for an undisclosed security issue; and the fix is not
publicly disclosed as as security fix; and the Apache project has been notified
of the both issue and the proposed fix; and the PMC has rejected neither the
vulnerability report nor the proposed fix.
- A change is a fix for a bug; and the Apache project has been notified of both
the bug and the proposed fix; and the PMC has rejected neither the bug report
nor the proposed fix.
- Minor changes (e.g. alterations to the start-up and shutdown scripts,
configuration files, file layout etc.) to integrate with the target platform
providing the Apache project has not objected to those changes.

The change you cite meets the 4th point, minor change, made for integration
reasons. There is no known technical objection; this was after all at one point
the state of Apache Spark.

2/ A version number must be used that both clearly differentiates it from an
Apache Software Foundation release and clearly identifies the Apache Software
Foundation version on which the software is based.

Keep in mind the product here is not "Apache Spark", but the "Databricks
Runtime 13.1 (including Apache Spark 3.4.0)". That is, there is far more than a
version number differentiating this product from Apache Spark. There is no
standalone distribution of Apache Spark anywhere here. I believe that easily
matches the intent.

3/ The documentation must clearly identify the Apache Software Foundation
version on which the software is based.

Clearly, yes.

4/ The end user expects that the distribution channel will back-port fixes. It
is not necessary to back-port all fixes. Selection of fixes to back-port must
be consistent with the update policy of that distribution channel.

I think this is safe to say too. Indeed this explicitly contemplates not
back-porting a change.

Backing up, you can see from this document that the spirit of it is: don't
include changes in your own Apache Foo x.y that aren't wanted by the project,
and still call it Apache Foo x.y. I don't believe your case matches this spirit
either.

I do think it's not crazy to suggest, hey vendor, would you call this "Apache
Spark + patches" or ".vendor123". But that's at best a suggestion, and I think
it does nothing in particular for users. You've made the suggestion, and I do
not see some police action from the PMC must follow.

I think you're simply objecting to a vendor choice, but that is not on-topic
here unless you can specifically rebut the reasoning above and show it's
connected.

On Wed, Jun 7, 2023 at 11:02 AM Dongjoon Hyun
<dongj...@apache.org<mailto:dongj...@apache.org>> wrote:
Sean, it seems that you are confused here. We are not talking about your upper
system (the notebook environment). We are talking about the submodule, "Apache
Spark 3.4.0-databricks". Whatever you call it, both of us knows "Apache Spark
3.4.0-databricks" is different from "Apache Spark 3.4.0". You should not use
"3.4.0" at your subsystem.

> This also is aimed at distributions of "Apache Foo", not products that
> "include Apache Foo", which are clearly not Apache Foo.

---------------------------------------------------------------------
To unsubscribe e-mail:
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>

Re: ASF policy violation and Scala version issues

Reply via email to