If we should fix, let's make sure we don't just disable the tests - we will
create another set of technical debt.


On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad <vro...@amazon.com.invalid> wrote:

> I’ll look into the JIRA. Please assign it to me.
>
> Thank you,
>
> Vlad
>
> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org> wrote:
> >
> > +1, Agree to remove the jar files from the Apache Spark repository and
> disable the affected tests.
> >
> > For the current test scenarios that use jar files, I believe we can
> definitely find a more reasonable testing approach.
> >
> > Thanks,
> > Jie Yang
> >
> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote:
> >> +1 on fixing test jars, though the way how it is fixed needs to be
> discussed, IMO. In the short term removing jars may still be the best
> option to satisfy ASF legal policy and avoid release removal.
> >>
> >> AFAIK, ASF mandates that users and developers have source code that
> they build from (source release), not that they run (binary release).
> >>
> >> Thank you,
> >>
> >> Vlad
> >>
> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <dongj...@apache.org>
> wrote:
> >>>
> >>> Thank you for your reply, Sean.
> >>>
> >>> I expected that argument exactly so that I started by quoting your
> sentence in the above.
> >>>
> >>> I understood the reasoning in 2018. However, there are two reasons why
> I brought this again in 2025:
> >>>
> >>> First, the open source sprit is technically and literally "no compiled
> code in a source release" like Apache Hadoop and Hive community does.
> Justin, Vlad, and Alex shared the same perspective to the Apache Spark PMC.
> >>>
> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l
> >>>      0
> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l
> >>>      0
> >>>
> >>> Second, last year, the open source communities were hit by
> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where the
> backdoor was hidden in the test object. I believe most of us are aware of
> that. At that time, the GitHub repository was disabled. As a member of
> Apache Spark PMC, I'm suggesting to remove that risk from the Apache Spark
> repository in 2025. I attached the following link to provide the XZ Utils
> history explicitly.
> >>>
> >>>
> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know
> >>>
> >>> Although I agree that those test coverages are important, I don't
> think that's worthy for Apache Spark community to take a risk to be
> shutdown. That's the lesson which I've learned last year.
> >>>
> >>> Sincerely,
> >>> Dongjoon.
> >>>
> >>> On 2025/02/26 13:31:56 Sean Owen wrote:
> >>>> The gist of the initial 2018 thread was:
> >>>> These are not source .jar files that users use, but .jar files used
> to test
> >>>> loading of from .jar files. These are test resources only.
> >>>> I don't think this is what the spirit of the rule is speaking to,
> that the
> >>>> end-user code should always have source code, which is the right
> principle.
> >>>> Checking in the code somewhere is nice to have though and I think
> that was
> >>>> the idea here.
> >>>>
> >>>> But, removing these and disabling potentially valuable tests seems
> like a
> >>>> step too far. There is no actual 'problem' w.r.t. the principle that
> users
> >>>> have source to the code they run.
> >>>>
> >>>> The 2025 thread just retreads the same ground as the 2018 thread.
> >>>> But I don't see that we put this argument to the person who raised it
> >>>> again. Why not that first?
> >>>> And, if possible, go stick the source to these jars in the source
> tree,
> >>>> where available.
> >>>>
> >>>>
> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi, All.
> >>>>>
> >>>>> Unfortunately, the Apache Spark project seems to have a technical
> debt in
> >>>>> the source code releases. It happens to be discussed at least twice
> on both
> >>>>> dev@spark and legal-discuss mailing lists. (Thank you for the
> head-up,
> >>>>> Vlad.)
> >>>>>
> >>>>> 1. https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8
> >>>>> (2018-06-21, dev@spark)
> >>>>> 2. https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k
> >>>>> (2018-06-25, legal-discuss@)
> >>>>> 3. https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd
> >>>>> (2025-02-25, dev@spark)
> >>>>>
> >>>>> To be short, according to the previous conclusion in 2018, the Apache
> >>>>> Spark community wanted to adhere to the ASF policy by removing those
> jar
> >>>>> files from source code releases (although it was not considered as a
> >>>>> release blocker at that time and until now).
> >>>>>
> >>>>>> it's important to be able to recreate these JARs somehow,
> >>>>>> and I don't think we have the source in the repo for all of them
> >>>>>> (at least, the ones that originate from Spark).
> >>>>>> That much seems like a must-do. After that, seems worth figuring out
> >>>>>> just how hard it is to build these artifacts from source.
> >>>>>> If it's easy, great. If not, either the test can be removed or
> >>>>>> we figure out just how hard a requirement this is.
> >>>>>
> >>>>> Given the unresolved issue for seven years, I proposed SPARK-51318
> as a
> >>>>> potential solution to comply with ASF policy. After SPARK-51318, we
> can
> >>>>> recover the test coverage one by one later by addressing IDed TODO
> items
> >>>>> without any legal concerns during the votes.
> >>>>>
> >>>>> https://issues.apache.org/jira/browse/SPARK-51318
> >>>>> (Remove `jar` files from Apache Spark repository and disable affected
> >>>>> tests)
> >>>>>
> >>>>> WDYT?
> >>>>>
> >>>>> BTW, please note that I didn't define SPARK-51318 as a blocker for
> any
> >>>>> on-going releases yet.
> >>>>>
> >>>>> Best regards,
> >>>>> Dongjoon.
> >>>>>
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>

Reply via email to