What's the difference between disabling tests for dev and release vs only for release?
On Tue, 25 Mar 2025 at 15:36, Rozov, Vlad <vro...@amazon.com.invalid> wrote: > Overall I don’t buy the solution where tests are skipped based on the > presence of a jar file. It looks too fragile to me. What if there is a bug > that does not add jar to a classpath? The test would be skipped, but not > because jar was deleted, but because classpath is incorrect. > > Thank you, > > Vlad > > On Mar 24, 2025, at 7:56 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: > > Valid concern. Maybe we can mark tests ignored when those tests do not > exist for now. So tagged commit will skip those tests. Dev commits will > still test them. > > On Tue, 25 Mar 2025 at 11:47, Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> Maybe we should also check that it is mandatory for source code being >> distributed under release to be able to pass the test suites? If this is >> mandatory, we can't just modify the release script to simply remove the >> jars, because this will break the tests in source code distribution. >> >> Actually this is my understanding to make sure tests pass from source >> code and could build the same artifacts we release from source code, but I >> might be wrong. >> >> On Tue, Mar 25, 2025 at 11:32 AM Hyukjin Kwon <gurwls...@apache.org> >> wrote: >> >>> Made a PR first (https://github.com/apache/spark/pull/50378). >>> >>> BTW, I agree that we should have the source code along with the jars, >>> and ideally the dev branch should not contain them as well. This is a >>> technical depth. >>> For this, I hope we can improve this incrementally. >>> >>> I will also take a look and see if we can reject jars automatically in >>> PRs or CI. >>> >>> >>> On Tue, 25 Mar 2025 at 11:15, Hyukjin Kwon <gurwls...@apache.org> wrote: >>> >>>> So the issues are source releases (https://github.com/apache/spark/tags) >>>> containing those jars, right? Can we add the removal of test jars at the >>>> part of the release process. >>>> >>>> They aren't included in binary releases in any event so removal on >>>> every source release should work. >>>> >>>> On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim < >>>> kabhwan.opensou...@gmail.com> wrote: >>>> >>>>> Let's make this very clear - do we not have a source code to build a >>>>> jar, or have no way to infer the source code being used for the jar? >>>>> >>>>> I understand the concern, but if this is a huge issue, why no one has >>>>> looked into this and here we just debate whether the affected tests need >>>>> to >>>>> be dropped/disabled or not? Whenever we add some test resources like a >>>>> golden file, we tend to leave the part of the code to build the golden >>>>> file. Did we check and confirm these jars are not the case and we lost the >>>>> source code to build? >>>>> >>>>> On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad <vro...@amazon.com.invalid> >>>>> wrote: >>>>> >>>>>> First of all I don’t think that conclusion on the >>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is >>>>>> correct. Jar files included into the source release are compiled from the >>>>>> code and replacing them with dat or jpeg files won’t work. Including jar >>>>>> files into the source release is against ASF policy and my -1 will stay >>>>>> as >>>>>> long as jars are included into the source release. As this issue was >>>>>> raised >>>>>> not for the first time and there was no action (actually more jars were >>>>>> added), IMO, the issue should now be handled as the release blocker. >>>>>> >>>>>> I don’t see anything in the proposal that suggests that fix >>>>>> for SPARK-51318 is or should be blocked by umbrella JIRA. The proposal >>>>>> was >>>>>> to recover tests one by one. The PR that I have open will allow to >>>>>> accomplish these tasks as all disabled tests refer to SPARK-51318. >>>>>> >>>>>> I can only help with SPARK-51318 at this point. Somebody else will >>>>>> have to look into keeping tests enabled as it requires source code for >>>>>> the >>>>>> test jars. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Vlad >>>>>> >>>>>> >>>>>> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org> >>>>>> wrote: >>>>>> >>>>>> I still disagree with just disabling tests and removing the jars >>>>>> without making sure that we will enable them back. >>>>>> I want to EITHER make sure we have a plan and someone to drive, and >>>>>> the tests will be enabled back, OR have a one fix that does all. >>>>>> Otherwise, my -1 stands if we can't be sure of that. >>>>>> >>>>>> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> From what I read in the last discussion in the legal thread ( >>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), >>>>>>> we don't really need to rush and block the release. >>>>>>> I don't think we should block the release, remove the CI, and just >>>>>>> remove the jars. >>>>>>> >>>>>>> Rozov, the original proposal of this thread is 1. to first disable >>>>>>> the tests, and 2. open an umbrella JIRA to enable individual tests. >>>>>>> Since you're driving this, would you mind either making a proper fix >>>>>>> in one go, or create an umbrella JIRA to drive this? >>>>>>> >>>>>>> >>>>>>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad <vro...@amazon.com.invalid> >>>>>>> wrote: >>>>>>> >>>>>>>> Let’s open a formal vote on the subject. I have open WIP PR >>>>>>>> https://github.com/apache/spark/pull/50231 that is currently >>>>>>>> blocked by -1. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> >>>>>>>> Vlad >>>>>>>> >>>>>>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> It seems there’s no quick fix for this issue. Should we remove >>>>>>>> these jars and disable the tests for now to comply with ASF policy? >>>>>>>> While >>>>>>>> this would temporarily reduce test coverage until we refactor the >>>>>>>> tests to >>>>>>>> avoid pre-compiled jars, we can encourage Spark vendors not to >>>>>>>> cherry-pick >>>>>>>> this test-disabling commit so they can help report any test failures. >>>>>>>> That >>>>>>>> said, since these tests are quite old and stable, failures are >>>>>>>> unlikely. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Wenchen >>>>>>>> >>>>>>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad >>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>> >>>>>>>>> There is a difference between technical debt and legal issue. ASF >>>>>>>>> may request to pull out release that does not meet ASF policy (and >>>>>>>>> having >>>>>>>>> tests is not ASF policy). IMO, SPARK-51318 should be a blocker for >>>>>>>>> the next >>>>>>>>> release or handled like a blocker. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> >>>>>>>>> Vlad >>>>>>>>> >>>>>>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim < >>>>>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> +1 to Hyukjin. If the test is effective, we should definitely >>>>>>>>> retain the effectiveness of the test, unless we end up with the >>>>>>>>> conclusion >>>>>>>>> that there is no way to do that. >>>>>>>>> >>>>>>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon <gurwls...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> If we should fix, let's make sure we don't just disable the tests >>>>>>>>>> - we will create another set of technical debt. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad >>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> I’ll look into the JIRA. Please assign it to me. >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> >>>>>>>>>>> Vlad >>>>>>>>>>> >>>>>>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> > >>>>>>>>>>> > +1, Agree to remove the jar files from the Apache Spark >>>>>>>>>>> repository and disable the affected tests. >>>>>>>>>>> > >>>>>>>>>>> > For the current test scenarios that use jar files, I believe >>>>>>>>>>> we can definitely find a more reasonable testing approach. >>>>>>>>>>> > >>>>>>>>>>> > Thanks, >>>>>>>>>>> > Jie Yang >>>>>>>>>>> > >>>>>>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote: >>>>>>>>>>> >> +1 on fixing test jars, though the way how it is fixed needs >>>>>>>>>>> to be discussed, IMO. In the short term removing jars may still be >>>>>>>>>>> the best >>>>>>>>>>> option to satisfy ASF legal policy and avoid release removal. >>>>>>>>>>> >> >>>>>>>>>>> >> AFAIK, ASF mandates that users and developers have source >>>>>>>>>>> code that they build from (source release), not that they run >>>>>>>>>>> (binary >>>>>>>>>>> release). >>>>>>>>>>> >> >>>>>>>>>>> >> Thank you, >>>>>>>>>>> >> >>>>>>>>>>> >> Vlad >>>>>>>>>>> >> >>>>>>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun < >>>>>>>>>>> dongj...@apache.org> wrote: >>>>>>>>>>> >>> >>>>>>>>>>> >>> Thank you for your reply, Sean. >>>>>>>>>>> >>> >>>>>>>>>>> >>> I expected that argument exactly so that I started by >>>>>>>>>>> quoting your sentence in the above. >>>>>>>>>>> >>> >>>>>>>>>>> >>> I understood the reasoning in 2018. However, there are two >>>>>>>>>>> reasons why I brought this again in 2025: >>>>>>>>>>> >>> >>>>>>>>>>> >>> First, the open source sprit is technically and literally >>>>>>>>>>> "no compiled code in a source release" like Apache Hadoop and Hive >>>>>>>>>>> community does. Justin, Vlad, and Alex shared the same perspective >>>>>>>>>>> to the >>>>>>>>>>> Apache Spark PMC. >>>>>>>>>>> >>> >>>>>>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l >>>>>>>>>>> >>> 0 >>>>>>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l >>>>>>>>>>> >>> 0 >>>>>>>>>>> >>> >>>>>>>>>>> >>> Second, last year, the open source communities were hit by >>>>>>>>>>> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where >>>>>>>>>>> the >>>>>>>>>>> backdoor was hidden in the test object. I believe most of us are >>>>>>>>>>> aware of >>>>>>>>>>> that. At that time, the GitHub repository was disabled. As a member >>>>>>>>>>> of >>>>>>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the >>>>>>>>>>> Apache Spark >>>>>>>>>>> repository in 2025. I attached the following link to provide the XZ >>>>>>>>>>> Utils >>>>>>>>>>> history explicitly. >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know >>>>>>>>>>> >>> >>>>>>>>>>> >>> Although I agree that those test coverages are important, I >>>>>>>>>>> don't think that's worthy for Apache Spark community to take a risk >>>>>>>>>>> to be >>>>>>>>>>> shutdown. That's the lesson which I've learned last year. >>>>>>>>>>> >>> >>>>>>>>>>> >>> Sincerely, >>>>>>>>>>> >>> Dongjoon. >>>>>>>>>>> >>> >>>>>>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote: >>>>>>>>>>> >>>> The gist of the initial 2018 thread was: >>>>>>>>>>> >>>> These are not source .jar files that users use, but .jar >>>>>>>>>>> files used to test >>>>>>>>>>> >>>> loading of from .jar files. These are test resources only. >>>>>>>>>>> >>>> I don't think this is what the spirit of the rule is >>>>>>>>>>> speaking to, that the >>>>>>>>>>> >>>> end-user code should always have source code, which is the >>>>>>>>>>> right principle. >>>>>>>>>>> >>>> Checking in the code somewhere is nice to have though and I >>>>>>>>>>> think that was >>>>>>>>>>> >>>> the idea here. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> But, removing these and disabling potentially valuable >>>>>>>>>>> tests seems like a >>>>>>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the >>>>>>>>>>> principle that users >>>>>>>>>>> >>>> have source to the code they run. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> The 2025 thread just retreads the same ground as the 2018 >>>>>>>>>>> thread. >>>>>>>>>>> >>>> But I don't see that we put this argument to the person who >>>>>>>>>>> raised it >>>>>>>>>>> >>>> again. Why not that first? >>>>>>>>>>> >>>> And, if possible, go stick the source to these jars in the >>>>>>>>>>> source tree, >>>>>>>>>>> >>>> where available. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun < >>>>>>>>>>> dongjoon.h...@gmail.com> >>>>>>>>>>> >>>> wrote: >>>>>>>>>>> >>>> >>>>>>>>>>> >>>>> Hi, All. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Unfortunately, the Apache Spark project seems to have a >>>>>>>>>>> technical debt in >>>>>>>>>>> >>>>> the source code releases. It happens to be discussed at >>>>>>>>>>> least twice on both >>>>>>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank you for >>>>>>>>>>> the head-up, >>>>>>>>>>> >>>>> Vlad.) >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> 1. >>>>>>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8 >>>>>>>>>>> >>>>> (2018-06-21, dev@spark) >>>>>>>>>>> >>>>> 2. >>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k >>>>>>>>>>> >>>>> (2018-06-25, legal-discuss@) >>>>>>>>>>> >>>>> 3. >>>>>>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd >>>>>>>>>>> >>>>> (2025-02-25, dev@spark) >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> To be short, according to the previous conclusion in 2018, >>>>>>>>>>> the Apache >>>>>>>>>>> >>>>> Spark community wanted to adhere to the ASF policy by >>>>>>>>>>> removing those jar >>>>>>>>>>> >>>>> files from source code releases (although it was not >>>>>>>>>>> considered as a >>>>>>>>>>> >>>>> release blocker at that time and until now). >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>>> it's important to be able to recreate these JARs somehow, >>>>>>>>>>> >>>>>> and I don't think we have the source in the repo for all >>>>>>>>>>> of them >>>>>>>>>>> >>>>>> (at least, the ones that originate from Spark). >>>>>>>>>>> >>>>>> That much seems like a must-do. After that, seems worth >>>>>>>>>>> figuring out >>>>>>>>>>> >>>>>> just how hard it is to build these artifacts from source. >>>>>>>>>>> >>>>>> If it's easy, great. If not, either the test can be >>>>>>>>>>> removed or >>>>>>>>>>> >>>>>> we figure out just how hard a requirement this is. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Given the unresolved issue for seven years, I proposed >>>>>>>>>>> SPARK-51318 as a >>>>>>>>>>> >>>>> potential solution to comply with ASF policy. After >>>>>>>>>>> SPARK-51318, we can >>>>>>>>>>> >>>>> recover the test coverage one by one later by addressing >>>>>>>>>>> IDed TODO items >>>>>>>>>>> >>>>> without any legal concerns during the votes. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318 >>>>>>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and >>>>>>>>>>> disable affected >>>>>>>>>>> >>>>> tests) >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> WDYT? >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as a >>>>>>>>>>> blocker for any >>>>>>>>>>> >>>>> on-going releases yet. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Best regards, >>>>>>>>>>> >>>>> Dongjoon. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>> >>> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >