So the issues are source releases (https://github.com/apache/spark/tags) containing those jars, right? Can we add the removal of test jars at the part of the release process.
They aren't included in binary releases in any event so removal on every source release should work. On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > Let's make this very clear - do we not have a source code to build a jar, > or have no way to infer the source code being used for the jar? > > I understand the concern, but if this is a huge issue, why no one has > looked into this and here we just debate whether the affected tests need to > be dropped/disabled or not? Whenever we add some test resources like a > golden file, we tend to leave the part of the code to build the golden > file. Did we check and confirm these jars are not the case and we lost the > source code to build? > > On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad <vro...@amazon.com.invalid> > wrote: > >> First of all I don’t think that conclusion on the >> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is >> correct. Jar files included into the source release are compiled from the >> code and replacing them with dat or jpeg files won’t work. Including jar >> files into the source release is against ASF policy and my -1 will stay as >> long as jars are included into the source release. As this issue was raised >> not for the first time and there was no action (actually more jars were >> added), IMO, the issue should now be handled as the release blocker. >> >> I don’t see anything in the proposal that suggests that fix >> for SPARK-51318 is or should be blocked by umbrella JIRA. The proposal was >> to recover tests one by one. The PR that I have open will allow to >> accomplish these tasks as all disabled tests refer to SPARK-51318. >> >> I can only help with SPARK-51318 at this point. Somebody else will have >> to look into keeping tests enabled as it requires source code for the test >> jars. >> >> Thank you, >> >> Vlad >> >> >> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: >> >> I still disagree with just disabling tests and removing the jars without >> making sure that we will enable them back. >> I want to EITHER make sure we have a plan and someone to drive, and the >> tests will be enabled back, OR have a one fix that does all. >> Otherwise, my -1 stands if we can't be sure of that. >> >> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org> wrote: >> >>> From what I read in the last discussion in the legal thread ( >>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), we >>> don't really need to rush and block the release. >>> I don't think we should block the release, remove the CI, and just >>> remove the jars. >>> >>> Rozov, the original proposal of this thread is 1. to first disable the >>> tests, and 2. open an umbrella JIRA to enable individual tests. >>> Since you're driving this, would you mind either making a proper fix in >>> one go, or create an umbrella JIRA to drive this? >>> >>> >>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad <vro...@amazon.com.invalid> >>> wrote: >>> >>>> Let’s open a formal vote on the subject. I have open WIP PR >>>> https://github.com/apache/spark/pull/50231 that is currently blocked >>>> by -1. >>>> >>>> Thank you, >>>> >>>> Vlad >>>> >>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote: >>>> >>>> >>>> It seems there’s no quick fix for this issue. Should we remove these >>>> jars and disable the tests for now to comply with ASF policy? While this >>>> would temporarily reduce test coverage until we refactor the tests to avoid >>>> pre-compiled jars, we can encourage Spark vendors not to cherry-pick this >>>> test-disabling commit so they can help report any test failures. That said, >>>> since these tests are quite old and stable, failures are unlikely. >>>> >>>> Thanks, >>>> Wenchen >>>> >>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad <vro...@amazon.com.invalid> >>>> wrote: >>>> >>>>> There is a difference between technical debt and legal issue. ASF may >>>>> request to pull out release that does not meet ASF policy (and having >>>>> tests >>>>> is not ASF policy). IMO, SPARK-51318 should be a blocker for the next >>>>> release or handled like a blocker. >>>>> >>>>> Thank you, >>>>> >>>>> Vlad >>>>> >>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim < >>>>> kabhwan.opensou...@gmail.com> wrote: >>>>> >>>>> +1 to Hyukjin. If the test is effective, we should definitely retain >>>>> the effectiveness of the test, unless we end up with the conclusion that >>>>> there is no way to do that. >>>>> >>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon <gurwls...@apache.org> >>>>> wrote: >>>>> >>>>>> If we should fix, let's make sure we don't just disable the tests - >>>>>> we will create another set of technical debt. >>>>>> >>>>>> >>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad <vro...@amazon.com.invalid> >>>>>> wrote: >>>>>> >>>>>>> I’ll look into the JIRA. Please assign it to me. >>>>>>> >>>>>>> Thank you, >>>>>>> >>>>>>> Vlad >>>>>>> >>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org> >>>>>>> wrote: >>>>>>> > >>>>>>> > +1, Agree to remove the jar files from the Apache Spark repository >>>>>>> and disable the affected tests. >>>>>>> > >>>>>>> > For the current test scenarios that use jar files, I believe we >>>>>>> can definitely find a more reasonable testing approach. >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Jie Yang >>>>>>> > >>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote: >>>>>>> >> +1 on fixing test jars, though the way how it is fixed needs to >>>>>>> be discussed, IMO. In the short term removing jars may still be the best >>>>>>> option to satisfy ASF legal policy and avoid release removal. >>>>>>> >> >>>>>>> >> AFAIK, ASF mandates that users and developers have source code >>>>>>> that they build from (source release), not that they run (binary >>>>>>> release). >>>>>>> >> >>>>>>> >> Thank you, >>>>>>> >> >>>>>>> >> Vlad >>>>>>> >> >>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <dongj...@apache.org> >>>>>>> wrote: >>>>>>> >>> >>>>>>> >>> Thank you for your reply, Sean. >>>>>>> >>> >>>>>>> >>> I expected that argument exactly so that I started by quoting >>>>>>> your sentence in the above. >>>>>>> >>> >>>>>>> >>> I understood the reasoning in 2018. However, there are two >>>>>>> reasons why I brought this again in 2025: >>>>>>> >>> >>>>>>> >>> First, the open source sprit is technically and literally "no >>>>>>> compiled code in a source release" like Apache Hadoop and Hive community >>>>>>> does. Justin, Vlad, and Alex shared the same perspective to the Apache >>>>>>> Spark PMC. >>>>>>> >>> >>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l >>>>>>> >>> 0 >>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l >>>>>>> >>> 0 >>>>>>> >>> >>>>>>> >>> Second, last year, the open source communities were hit by >>>>>>> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where the >>>>>>> backdoor was hidden in the test object. I believe most of us are aware >>>>>>> of >>>>>>> that. At that time, the GitHub repository was disabled. As a member of >>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the Apache >>>>>>> Spark >>>>>>> repository in 2025. I attached the following link to provide the XZ >>>>>>> Utils >>>>>>> history explicitly. >>>>>>> >>> >>>>>>> >>> >>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know >>>>>>> >>> >>>>>>> >>> Although I agree that those test coverages are important, I >>>>>>> don't think that's worthy for Apache Spark community to take a risk to >>>>>>> be >>>>>>> shutdown. That's the lesson which I've learned last year. >>>>>>> >>> >>>>>>> >>> Sincerely, >>>>>>> >>> Dongjoon. >>>>>>> >>> >>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote: >>>>>>> >>>> The gist of the initial 2018 thread was: >>>>>>> >>>> These are not source .jar files that users use, but .jar files >>>>>>> used to test >>>>>>> >>>> loading of from .jar files. These are test resources only. >>>>>>> >>>> I don't think this is what the spirit of the rule is speaking >>>>>>> to, that the >>>>>>> >>>> end-user code should always have source code, which is the >>>>>>> right principle. >>>>>>> >>>> Checking in the code somewhere is nice to have though and I >>>>>>> think that was >>>>>>> >>>> the idea here. >>>>>>> >>>> >>>>>>> >>>> But, removing these and disabling potentially valuable tests >>>>>>> seems like a >>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the principle >>>>>>> that users >>>>>>> >>>> have source to the code they run. >>>>>>> >>>> >>>>>>> >>>> The 2025 thread just retreads the same ground as the 2018 >>>>>>> thread. >>>>>>> >>>> But I don't see that we put this argument to the person who >>>>>>> raised it >>>>>>> >>>> again. Why not that first? >>>>>>> >>>> And, if possible, go stick the source to these jars in the >>>>>>> source tree, >>>>>>> >>>> where available. >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun < >>>>>>> dongjoon.h...@gmail.com> >>>>>>> >>>> wrote: >>>>>>> >>>> >>>>>>> >>>>> Hi, All. >>>>>>> >>>>> >>>>>>> >>>>> Unfortunately, the Apache Spark project seems to have a >>>>>>> technical debt in >>>>>>> >>>>> the source code releases. It happens to be discussed at least >>>>>>> twice on both >>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank you for the >>>>>>> head-up, >>>>>>> >>>>> Vlad.) >>>>>>> >>>>> >>>>>>> >>>>> 1. >>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8 >>>>>>> >>>>> (2018-06-21, dev@spark) >>>>>>> >>>>> 2. >>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k >>>>>>> >>>>> (2018-06-25, legal-discuss@) >>>>>>> >>>>> 3. >>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd >>>>>>> >>>>> (2025-02-25, dev@spark) >>>>>>> >>>>> >>>>>>> >>>>> To be short, according to the previous conclusion in 2018, the >>>>>>> Apache >>>>>>> >>>>> Spark community wanted to adhere to the ASF policy by removing >>>>>>> those jar >>>>>>> >>>>> files from source code releases (although it was not >>>>>>> considered as a >>>>>>> >>>>> release blocker at that time and until now). >>>>>>> >>>>> >>>>>>> >>>>>> it's important to be able to recreate these JARs somehow, >>>>>>> >>>>>> and I don't think we have the source in the repo for all of >>>>>>> them >>>>>>> >>>>>> (at least, the ones that originate from Spark). >>>>>>> >>>>>> That much seems like a must-do. After that, seems worth >>>>>>> figuring out >>>>>>> >>>>>> just how hard it is to build these artifacts from source. >>>>>>> >>>>>> If it's easy, great. If not, either the test can be removed or >>>>>>> >>>>>> we figure out just how hard a requirement this is. >>>>>>> >>>>> >>>>>>> >>>>> Given the unresolved issue for seven years, I proposed >>>>>>> SPARK-51318 as a >>>>>>> >>>>> potential solution to comply with ASF policy. After >>>>>>> SPARK-51318, we can >>>>>>> >>>>> recover the test coverage one by one later by addressing IDed >>>>>>> TODO items >>>>>>> >>>>> without any legal concerns during the votes. >>>>>>> >>>>> >>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318 >>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and disable >>>>>>> affected >>>>>>> >>>>> tests) >>>>>>> >>>>> >>>>>>> >>>>> WDYT? >>>>>>> >>>>> >>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as a blocker >>>>>>> for any >>>>>>> >>>>> on-going releases yet. >>>>>>> >>>>> >>>>>>> >>>>> Best regards, >>>>>>> >>>>> Dongjoon. >>>>>>> >>>>> >>>>>>> >>>> >>>>>>> >>> >>>>>>> >>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> --------------------------------------------------------------------- >>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >> >>>>>>> >> >>>>>>> > >>>>>>> > >>>>>>> --------------------------------------------------------------------- >>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> > >>>>>>> >>>>>>> >>>>> >>>> >>