If we should fix, let's make sure we don't just disable the tests - we will create another set of technical debt.
On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad <vro...@amazon.com.invalid> wrote: > I’ll look into the JIRA. Please assign it to me. > > Thank you, > > Vlad > > > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org> wrote: > > > > +1, Agree to remove the jar files from the Apache Spark repository and > disable the affected tests. > > > > For the current test scenarios that use jar files, I believe we can > definitely find a more reasonable testing approach. > > > > Thanks, > > Jie Yang > > > > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote: > >> +1 on fixing test jars, though the way how it is fixed needs to be > discussed, IMO. In the short term removing jars may still be the best > option to satisfy ASF legal policy and avoid release removal. > >> > >> AFAIK, ASF mandates that users and developers have source code that > they build from (source release), not that they run (binary release). > >> > >> Thank you, > >> > >> Vlad > >> > >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <dongj...@apache.org> > wrote: > >>> > >>> Thank you for your reply, Sean. > >>> > >>> I expected that argument exactly so that I started by quoting your > sentence in the above. > >>> > >>> I understood the reasoning in 2018. However, there are two reasons why > I brought this again in 2025: > >>> > >>> First, the open source sprit is technically and literally "no compiled > code in a source release" like Apache Hadoop and Hive community does. > Justin, Vlad, and Alex shared the same perspective to the Apache Spark PMC. > >>> > >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l > >>> 0 > >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l > >>> 0 > >>> > >>> Second, last year, the open source communities were hit by > CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where the > backdoor was hidden in the test object. I believe most of us are aware of > that. At that time, the GitHub repository was disabled. As a member of > Apache Spark PMC, I'm suggesting to remove that risk from the Apache Spark > repository in 2025. I attached the following link to provide the XZ Utils > history explicitly. > >>> > >>> > https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know > >>> > >>> Although I agree that those test coverages are important, I don't > think that's worthy for Apache Spark community to take a risk to be > shutdown. That's the lesson which I've learned last year. > >>> > >>> Sincerely, > >>> Dongjoon. > >>> > >>> On 2025/02/26 13:31:56 Sean Owen wrote: > >>>> The gist of the initial 2018 thread was: > >>>> These are not source .jar files that users use, but .jar files used > to test > >>>> loading of from .jar files. These are test resources only. > >>>> I don't think this is what the spirit of the rule is speaking to, > that the > >>>> end-user code should always have source code, which is the right > principle. > >>>> Checking in the code somewhere is nice to have though and I think > that was > >>>> the idea here. > >>>> > >>>> But, removing these and disabling potentially valuable tests seems > like a > >>>> step too far. There is no actual 'problem' w.r.t. the principle that > users > >>>> have source to the code they run. > >>>> > >>>> The 2025 thread just retreads the same ground as the 2018 thread. > >>>> But I don't see that we put this argument to the person who raised it > >>>> again. Why not that first? > >>>> And, if possible, go stick the source to these jars in the source > tree, > >>>> where available. > >>>> > >>>> > >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun < > dongjoon.h...@gmail.com> > >>>> wrote: > >>>> > >>>>> Hi, All. > >>>>> > >>>>> Unfortunately, the Apache Spark project seems to have a technical > debt in > >>>>> the source code releases. It happens to be discussed at least twice > on both > >>>>> dev@spark and legal-discuss mailing lists. (Thank you for the > head-up, > >>>>> Vlad.) > >>>>> > >>>>> 1. https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8 > >>>>> (2018-06-21, dev@spark) > >>>>> 2. https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k > >>>>> (2018-06-25, legal-discuss@) > >>>>> 3. https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd > >>>>> (2025-02-25, dev@spark) > >>>>> > >>>>> To be short, according to the previous conclusion in 2018, the Apache > >>>>> Spark community wanted to adhere to the ASF policy by removing those > jar > >>>>> files from source code releases (although it was not considered as a > >>>>> release blocker at that time and until now). > >>>>> > >>>>>> it's important to be able to recreate these JARs somehow, > >>>>>> and I don't think we have the source in the repo for all of them > >>>>>> (at least, the ones that originate from Spark). > >>>>>> That much seems like a must-do. After that, seems worth figuring out > >>>>>> just how hard it is to build these artifacts from source. > >>>>>> If it's easy, great. If not, either the test can be removed or > >>>>>> we figure out just how hard a requirement this is. > >>>>> > >>>>> Given the unresolved issue for seven years, I proposed SPARK-51318 > as a > >>>>> potential solution to comply with ASF policy. After SPARK-51318, we > can > >>>>> recover the test coverage one by one later by addressing IDed TODO > items > >>>>> without any legal concerns during the votes. > >>>>> > >>>>> https://issues.apache.org/jira/browse/SPARK-51318 > >>>>> (Remove `jar` files from Apache Spark repository and disable affected > >>>>> tests) > >>>>> > >>>>> WDYT? > >>>>> > >>>>> BTW, please note that I didn't define SPARK-51318 as a blocker for > any > >>>>> on-going releases yet. > >>>>> > >>>>> Best regards, > >>>>> Dongjoon. > >>>>> > >>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > >