While I'd love to resolve this issue, I still don't understand why we would block the release for this.
On Tue, Mar 25, 2025 at 7:49 AM Rozov, Vlad <vro...@amazon.com.invalid> wrote: > The difference is in the way how tests are disabled. > > - the approach encourages keeping jars files in the Apache Spark repo > - it is hard to identify what tests are impacted by jars so they can be > properly fixed > - the solution relies on jar being present or not present on the > classpath. Tests may be skipped unintentionally. It is also very easy to > introduce new tests that do not skip if jar does not exist. Such test will > break only during release. > > IMO, it is necessary to see if the source code for test jars is available > or can be reconstructed. If not, it is necessary to see how the > functionality still can be tested even if jar is not available. If the > source code is available, to keep the tests it is necessary to build jars > during tests or publish jars to maven and pull them as the test dependency. > > Thank you, > > Vlad > > On Mar 24, 2025, at 11:52 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: > > What's the difference between disabling tests for dev and release vs only > for release? > > On Tue, 25 Mar 2025 at 15:36, Rozov, Vlad <vro...@amazon.com.invalid> > wrote: > >> Overall I don’t buy the solution where tests are skipped based on the >> presence of a jar file. It looks too fragile to me. What if there is a bug >> that does not add jar to a classpath? The test would be skipped, but not >> because jar was deleted, but because classpath is incorrect. >> >> Thank you, >> >> Vlad >> >> On Mar 24, 2025, at 7:56 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: >> >> Valid concern. Maybe we can mark tests ignored when those tests do not >> exist for now. So tagged commit will skip those tests. Dev commits will >> still test them. >> >> On Tue, 25 Mar 2025 at 11:47, Jungtaek Lim <kabhwan.opensou...@gmail.com> >> wrote: >> >>> Maybe we should also check that it is mandatory for source code being >>> distributed under release to be able to pass the test suites? If this is >>> mandatory, we can't just modify the release script to simply remove the >>> jars, because this will break the tests in source code distribution. >>> >>> Actually this is my understanding to make sure tests pass from source >>> code and could build the same artifacts we release from source code, but I >>> might be wrong. >>> >>> On Tue, Mar 25, 2025 at 11:32 AM Hyukjin Kwon <gurwls...@apache.org> >>> wrote: >>> >>>> Made a PR first (https://github.com/apache/spark/pull/50378). >>>> >>>> BTW, I agree that we should have the source code along with the jars, >>>> and ideally the dev branch should not contain them as well. This is a >>>> technical depth. >>>> For this, I hope we can improve this incrementally. >>>> >>>> I will also take a look and see if we can reject jars automatically in >>>> PRs or CI. >>>> >>>> >>>> On Tue, 25 Mar 2025 at 11:15, Hyukjin Kwon <gurwls...@apache.org> >>>> wrote: >>>> >>>>> So the issues are source releases ( >>>>> https://github.com/apache/spark/tags) containing those jars, right? >>>>> Can we add the removal of test jars at the part of the release process. >>>>> >>>>> They aren't included in binary releases in any event so removal on >>>>> every source release should work. >>>>> >>>>> On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim < >>>>> kabhwan.opensou...@gmail.com> wrote: >>>>> >>>>>> Let's make this very clear - do we not have a source code to build a >>>>>> jar, or have no way to infer the source code being used for the jar? >>>>>> >>>>>> I understand the concern, but if this is a huge issue, why no one has >>>>>> looked into this and here we just debate whether the affected tests need >>>>>> to >>>>>> be dropped/disabled or not? Whenever we add some test resources like a >>>>>> golden file, we tend to leave the part of the code to build the golden >>>>>> file. Did we check and confirm these jars are not the case and we lost >>>>>> the >>>>>> source code to build? >>>>>> >>>>>> On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad <vro...@amazon.com.invalid> >>>>>> wrote: >>>>>> >>>>>>> First of all I don’t think that conclusion on the >>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is >>>>>>> correct. Jar files included into the source release are compiled from >>>>>>> the >>>>>>> code and replacing them with dat or jpeg files won’t work. Including jar >>>>>>> files into the source release is against ASF policy and my -1 will stay >>>>>>> as >>>>>>> long as jars are included into the source release. As this issue was >>>>>>> raised >>>>>>> not for the first time and there was no action (actually more jars were >>>>>>> added), IMO, the issue should now be handled as the release blocker. >>>>>>> >>>>>>> I don’t see anything in the proposal that suggests that fix >>>>>>> for SPARK-51318 is or should be blocked by umbrella JIRA. The proposal >>>>>>> was >>>>>>> to recover tests one by one. The PR that I have open will allow to >>>>>>> accomplish these tasks as all disabled tests refer to SPARK-51318. >>>>>>> >>>>>>> I can only help with SPARK-51318 at this point. Somebody else will >>>>>>> have to look into keeping tests enabled as it requires source code for >>>>>>> the >>>>>>> test jars. >>>>>>> >>>>>>> Thank you, >>>>>>> >>>>>>> Vlad >>>>>>> >>>>>>> >>>>>>> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>> I still disagree with just disabling tests and removing the jars >>>>>>> without making sure that we will enable them back. >>>>>>> I want to EITHER make sure we have a plan and someone to drive, and >>>>>>> the tests will be enabled back, OR have a one fix that does all. >>>>>>> Otherwise, my -1 stands if we can't be sure of that. >>>>>>> >>>>>>> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> From what I read in the last discussion in the legal thread ( >>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), >>>>>>>> we don't really need to rush and block the release. >>>>>>>> I don't think we should block the release, remove the CI, and just >>>>>>>> remove the jars. >>>>>>>> >>>>>>>> Rozov, the original proposal of this thread is 1. to first disable >>>>>>>> the tests, and 2. open an umbrella JIRA to enable individual tests. >>>>>>>> Since you're driving this, would you mind either making a proper >>>>>>>> fix in one go, or create an umbrella JIRA to drive this? >>>>>>>> >>>>>>>> >>>>>>>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad <vro...@amazon.com.invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Let’s open a formal vote on the subject. I have open WIP PR >>>>>>>>> https://github.com/apache/spark/pull/50231 that is currently >>>>>>>>> blocked by -1. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> >>>>>>>>> Vlad >>>>>>>>> >>>>>>>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> It seems there’s no quick fix for this issue. Should we remove >>>>>>>>> these jars and disable the tests for now to comply with ASF policy? >>>>>>>>> While >>>>>>>>> this would temporarily reduce test coverage until we refactor the >>>>>>>>> tests to >>>>>>>>> avoid pre-compiled jars, we can encourage Spark vendors not to >>>>>>>>> cherry-pick >>>>>>>>> this test-disabling commit so they can help report any test failures. >>>>>>>>> That >>>>>>>>> said, since these tests are quite old and stable, failures are >>>>>>>>> unlikely. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Wenchen >>>>>>>>> >>>>>>>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad >>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>> >>>>>>>>>> There is a difference between technical debt and legal issue. ASF >>>>>>>>>> may request to pull out release that does not meet ASF policy (and >>>>>>>>>> having >>>>>>>>>> tests is not ASF policy). IMO, SPARK-51318 should be a blocker for >>>>>>>>>> the next >>>>>>>>>> release or handled like a blocker. >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> >>>>>>>>>> Vlad >>>>>>>>>> >>>>>>>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim < >>>>>>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> +1 to Hyukjin. If the test is effective, we should definitely >>>>>>>>>> retain the effectiveness of the test, unless we end up with the >>>>>>>>>> conclusion >>>>>>>>>> that there is no way to do that. >>>>>>>>>> >>>>>>>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon < >>>>>>>>>> gurwls...@apache.org> wrote: >>>>>>>>>> >>>>>>>>>>> If we should fix, let's make sure we don't just disable the >>>>>>>>>>> tests - we will create another set of technical debt. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad >>>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> I’ll look into the JIRA. Please assign it to me. >>>>>>>>>>>> >>>>>>>>>>>> Thank you, >>>>>>>>>>>> >>>>>>>>>>>> Vlad >>>>>>>>>>>> >>>>>>>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > +1, Agree to remove the jar files from the Apache Spark >>>>>>>>>>>> repository and disable the affected tests. >>>>>>>>>>>> > >>>>>>>>>>>> > For the current test scenarios that use jar files, I believe >>>>>>>>>>>> we can definitely find a more reasonable testing approach. >>>>>>>>>>>> > >>>>>>>>>>>> > Thanks, >>>>>>>>>>>> > Jie Yang >>>>>>>>>>>> > >>>>>>>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote: >>>>>>>>>>>> >> +1 on fixing test jars, though the way how it is fixed needs >>>>>>>>>>>> to be discussed, IMO. In the short term removing jars may still be >>>>>>>>>>>> the best >>>>>>>>>>>> option to satisfy ASF legal policy and avoid release removal. >>>>>>>>>>>> >> >>>>>>>>>>>> >> AFAIK, ASF mandates that users and developers have source >>>>>>>>>>>> code that they build from (source release), not that they run >>>>>>>>>>>> (binary >>>>>>>>>>>> release). >>>>>>>>>>>> >> >>>>>>>>>>>> >> Thank you, >>>>>>>>>>>> >> >>>>>>>>>>>> >> Vlad >>>>>>>>>>>> >> >>>>>>>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun < >>>>>>>>>>>> dongj...@apache.org> wrote: >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> Thank you for your reply, Sean. >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> I expected that argument exactly so that I started by >>>>>>>>>>>> quoting your sentence in the above. >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> I understood the reasoning in 2018. However, there are two >>>>>>>>>>>> reasons why I brought this again in 2025: >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> First, the open source sprit is technically and literally >>>>>>>>>>>> "no compiled code in a source release" like Apache Hadoop and Hive >>>>>>>>>>>> community does. Justin, Vlad, and Alex shared the same perspective >>>>>>>>>>>> to the >>>>>>>>>>>> Apache Spark PMC. >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l >>>>>>>>>>>> >>> 0 >>>>>>>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l >>>>>>>>>>>> >>> 0 >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> Second, last year, the open source communities were hit by >>>>>>>>>>>> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where >>>>>>>>>>>> the >>>>>>>>>>>> backdoor was hidden in the test object. I believe most of us are >>>>>>>>>>>> aware of >>>>>>>>>>>> that. At that time, the GitHub repository was disabled. As a >>>>>>>>>>>> member of >>>>>>>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the >>>>>>>>>>>> Apache Spark >>>>>>>>>>>> repository in 2025. I attached the following link to provide the >>>>>>>>>>>> XZ Utils >>>>>>>>>>>> history explicitly. >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> Although I agree that those test coverages are important, I >>>>>>>>>>>> don't think that's worthy for Apache Spark community to take a >>>>>>>>>>>> risk to be >>>>>>>>>>>> shutdown. That's the lesson which I've learned last year. >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> Sincerely, >>>>>>>>>>>> >>> Dongjoon. >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote: >>>>>>>>>>>> >>>> The gist of the initial 2018 thread was: >>>>>>>>>>>> >>>> These are not source .jar files that users use, but .jar >>>>>>>>>>>> files used to test >>>>>>>>>>>> >>>> loading of from .jar files. These are test resources only. >>>>>>>>>>>> >>>> I don't think this is what the spirit of the rule is >>>>>>>>>>>> speaking to, that the >>>>>>>>>>>> >>>> end-user code should always have source code, which is the >>>>>>>>>>>> right principle. >>>>>>>>>>>> >>>> Checking in the code somewhere is nice to have though and >>>>>>>>>>>> I think that was >>>>>>>>>>>> >>>> the idea here. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> But, removing these and disabling potentially valuable >>>>>>>>>>>> tests seems like a >>>>>>>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the >>>>>>>>>>>> principle that users >>>>>>>>>>>> >>>> have source to the code they run. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> The 2025 thread just retreads the same ground as the 2018 >>>>>>>>>>>> thread. >>>>>>>>>>>> >>>> But I don't see that we put this argument to the person >>>>>>>>>>>> who raised it >>>>>>>>>>>> >>>> again. Why not that first? >>>>>>>>>>>> >>>> And, if possible, go stick the source to these jars in the >>>>>>>>>>>> source tree, >>>>>>>>>>>> >>>> where available. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun < >>>>>>>>>>>> dongjoon.h...@gmail.com> >>>>>>>>>>>> >>>> wrote: >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>>> Hi, All. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> Unfortunately, the Apache Spark project seems to have a >>>>>>>>>>>> technical debt in >>>>>>>>>>>> >>>>> the source code releases. It happens to be discussed at >>>>>>>>>>>> least twice on both >>>>>>>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank you >>>>>>>>>>>> for the head-up, >>>>>>>>>>>> >>>>> Vlad.) >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> 1. >>>>>>>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8 >>>>>>>>>>>> >>>>> (2018-06-21, dev@spark) >>>>>>>>>>>> >>>>> 2. >>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k >>>>>>>>>>>> >>>>> (2018-06-25, legal-discuss@) >>>>>>>>>>>> >>>>> 3. >>>>>>>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd >>>>>>>>>>>> >>>>> (2025-02-25, dev@spark) >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> To be short, according to the previous conclusion in >>>>>>>>>>>> 2018, the Apache >>>>>>>>>>>> >>>>> Spark community wanted to adhere to the ASF policy by >>>>>>>>>>>> removing those jar >>>>>>>>>>>> >>>>> files from source code releases (although it was not >>>>>>>>>>>> considered as a >>>>>>>>>>>> >>>>> release blocker at that time and until now). >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>>> it's important to be able to recreate these JARs somehow, >>>>>>>>>>>> >>>>>> and I don't think we have the source in the repo for all >>>>>>>>>>>> of them >>>>>>>>>>>> >>>>>> (at least, the ones that originate from Spark). >>>>>>>>>>>> >>>>>> That much seems like a must-do. After that, seems worth >>>>>>>>>>>> figuring out >>>>>>>>>>>> >>>>>> just how hard it is to build these artifacts from source. >>>>>>>>>>>> >>>>>> If it's easy, great. If not, either the test can be >>>>>>>>>>>> removed or >>>>>>>>>>>> >>>>>> we figure out just how hard a requirement this is. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> Given the unresolved issue for seven years, I proposed >>>>>>>>>>>> SPARK-51318 as a >>>>>>>>>>>> >>>>> potential solution to comply with ASF policy. After >>>>>>>>>>>> SPARK-51318, we can >>>>>>>>>>>> >>>>> recover the test coverage one by one later by addressing >>>>>>>>>>>> IDed TODO items >>>>>>>>>>>> >>>>> without any legal concerns during the votes. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318 >>>>>>>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and >>>>>>>>>>>> disable affected >>>>>>>>>>>> >>>>> tests) >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> WDYT? >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as a >>>>>>>>>>>> blocker for any >>>>>>>>>>>> >>>>> on-going releases yet. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> Best regards, >>>>>>>>>>>> >>>>> Dongjoon. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>> >>> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >> >