I'd go along with that tradeoff, if the tests aren't that important - are
they? I didn't see any discussion of what if anything we lose.
I don't find the 'literal' interpretation compelling; the vulnerability one
is, though, I don't think there is any evidence it affects these .jars.

On Wed, Feb 26, 2025 at 10:50 AM Dongjoon Hyun <dongj...@apache.org> wrote:

> Thank you for your reply, Sean.
>
> I expected that argument exactly so that I started by quoting your
> sentence in the above.
>
> I understood the reasoning in 2018. However, there are two reasons why I
> brought this again in 2025:
>
> First, the open source sprit is technically and literally "no compiled
> code in a source release" like Apache Hadoop and Hive community does.
> Justin, Vlad, and Alex shared the same perspective to the Apache Spark PMC.
>
>   $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l
>        0
>   $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l
>        0
>
> Second, last year, the open source communities were hit by CVE-2024-3094
> ("XZ Utils Backdoor") in the world-wide manner where the backdoor was
> hidden in the test object. I believe most of us are aware of that. At that
> time, the GitHub repository was disabled. As a member of Apache Spark PMC,
> I'm suggesting to remove that risk from the Apache Spark repository in
> 2025. I attached the following link to provide the XZ Utils history
> explicitly.
>
>
> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know
>
> Although I agree that those test coverages are important, I don't think
> that's worthy for Apache Spark community to take a risk to be shutdown.
> That's the lesson which I've learned last year.
>
> Sincerely,
> Dongjoon.
>
> On 2025/02/26 13:31:56 Sean Owen wrote:
> > The gist of the initial 2018 thread was:
> > These are not source .jar files that users use, but .jar files used to
> test
> > loading of from .jar files. These are test resources only.
> > I don't think this is what the spirit of the rule is speaking to, that
> the
> > end-user code should always have source code, which is the right
> principle.
> > Checking in the code somewhere is nice to have though and I think that
> was
> > the idea here.
> >
> > But, removing these and disabling potentially valuable tests seems like a
> > step too far. There is no actual 'problem' w.r.t. the principle that
> users
> > have source to the code they run.
> >
> > The 2025 thread just retreads the same ground as the 2018 thread.
> > But I don't see that we put this argument to the person who raised it
> > again. Why not that first?
> > And, if possible, go stick the source to these jars in the source tree,
> > where available.
> >
> >
> > On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
> > wrote:
> >
> > > Hi, All.
> > >
> > > Unfortunately, the Apache Spark project seems to have a technical debt
> in
> > > the source code releases. It happens to be discussed at least twice on
> both
> > > dev@spark and legal-discuss mailing lists. (Thank you for the head-up,
> > > Vlad.)
> > >
> > > 1. https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8
> > > (2018-06-21, dev@spark)
> > > 2. https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k
> > > (2018-06-25, legal-discuss@)
> > > 3. https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd
> > > (2025-02-25, dev@spark)
> > >
> > > To be short, according to the previous conclusion in 2018, the Apache
> > > Spark community wanted to adhere to the ASF policy by removing those
> jar
> > > files from source code releases (although it was not considered as a
> > > release blocker at that time and until now).
> > >
> > > > it's important to be able to recreate these JARs somehow,
> > > > and I don't think we have the source in the repo for all of them
> > > > (at least, the ones that originate from Spark).
> > > > That much seems like a must-do. After that, seems worth figuring out
> > > > just how hard it is to build these artifacts from source.
> > > > If it's easy, great. If not, either the test can be removed or
> > > > we figure out just how hard a requirement this is.
> > >
> > > Given the unresolved issue for seven years, I proposed SPARK-51318 as a
> > > potential solution to comply with ASF policy. After SPARK-51318, we can
> > > recover the test coverage one by one later by addressing IDed TODO
> items
> > > without any legal concerns during the votes.
> > >
> > > https://issues.apache.org/jira/browse/SPARK-51318
> > > (Remove `jar` files from Apache Spark repository and disable affected
> > > tests)
> > >
> > > WDYT?
> > >
> > > BTW, please note that I didn't define SPARK-51318 as a blocker for any
> > > on-going releases yet.
> > >
> > > Best regards,
> > > Dongjoon.
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to