I still disagree with just disabling tests and removing the jars without making sure that we will enable them back. I want to EITHER make sure we have a plan and someone to drive, and the tests will be enabled back, OR have a one fix that does all. Otherwise, my -1 stands if we can't be sure of that.
On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org> wrote: > From what I read in the last discussion in the legal thread ( > https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), we > don't really need to rush and block the release. > I don't think we should block the release, remove the CI, and just remove > the jars. > > Rozov, the original proposal of this thread is 1. to first disable the > tests, and 2. open an umbrella JIRA to enable individual tests. > Since you're driving this, would you mind either making a proper fix in > one go, or create an umbrella JIRA to drive this? > > > On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad <vro...@amazon.com.invalid> > wrote: > >> Let’s open a formal vote on the subject. I have open WIP PR >> https://github.com/apache/spark/pull/50231 that is currently blocked by >> -1. >> >> Thank you, >> >> Vlad >> >> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote: >> >> >> It seems there’s no quick fix for this issue. Should we remove these jars >> and disable the tests for now to comply with ASF policy? While this would >> temporarily reduce test coverage until we refactor the tests to avoid >> pre-compiled jars, we can encourage Spark vendors not to cherry-pick this >> test-disabling commit so they can help report any test failures. That said, >> since these tests are quite old and stable, failures are unlikely. >> >> Thanks, >> Wenchen >> >> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad <vro...@amazon.com.invalid> >> wrote: >> >>> There is a difference between technical debt and legal issue. ASF may >>> request to pull out release that does not meet ASF policy (and having tests >>> is not ASF policy). IMO, SPARK-51318 should be a blocker for the next >>> release or handled like a blocker. >>> >>> Thank you, >>> >>> Vlad >>> >>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim <kabhwan.opensou...@gmail.com> >>> wrote: >>> >>> +1 to Hyukjin. If the test is effective, we should definitely retain the >>> effectiveness of the test, unless we end up with the conclusion that there >>> is no way to do that. >>> >>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon <gurwls...@apache.org> >>> wrote: >>> >>>> If we should fix, let's make sure we don't just disable the tests - we >>>> will create another set of technical debt. >>>> >>>> >>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad <vro...@amazon.com.invalid> >>>> wrote: >>>> >>>>> I’ll look into the JIRA. Please assign it to me. >>>>> >>>>> Thank you, >>>>> >>>>> Vlad >>>>> >>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org> wrote: >>>>> > >>>>> > +1, Agree to remove the jar files from the Apache Spark repository >>>>> and disable the affected tests. >>>>> > >>>>> > For the current test scenarios that use jar files, I believe we can >>>>> definitely find a more reasonable testing approach. >>>>> > >>>>> > Thanks, >>>>> > Jie Yang >>>>> > >>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote: >>>>> >> +1 on fixing test jars, though the way how it is fixed needs to be >>>>> discussed, IMO. In the short term removing jars may still be the best >>>>> option to satisfy ASF legal policy and avoid release removal. >>>>> >> >>>>> >> AFAIK, ASF mandates that users and developers have source code that >>>>> they build from (source release), not that they run (binary release). >>>>> >> >>>>> >> Thank you, >>>>> >> >>>>> >> Vlad >>>>> >> >>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <dongj...@apache.org> >>>>> wrote: >>>>> >>> >>>>> >>> Thank you for your reply, Sean. >>>>> >>> >>>>> >>> I expected that argument exactly so that I started by quoting your >>>>> sentence in the above. >>>>> >>> >>>>> >>> I understood the reasoning in 2018. However, there are two reasons >>>>> why I brought this again in 2025: >>>>> >>> >>>>> >>> First, the open source sprit is technically and literally "no >>>>> compiled code in a source release" like Apache Hadoop and Hive community >>>>> does. Justin, Vlad, and Alex shared the same perspective to the Apache >>>>> Spark PMC. >>>>> >>> >>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l >>>>> >>> 0 >>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l >>>>> >>> 0 >>>>> >>> >>>>> >>> Second, last year, the open source communities were hit by >>>>> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where the >>>>> backdoor was hidden in the test object. I believe most of us are aware of >>>>> that. At that time, the GitHub repository was disabled. As a member of >>>>> Apache Spark PMC, I'm suggesting to remove that risk from the Apache Spark >>>>> repository in 2025. I attached the following link to provide the XZ Utils >>>>> history explicitly. >>>>> >>> >>>>> >>> >>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know >>>>> >>> >>>>> >>> Although I agree that those test coverages are important, I don't >>>>> think that's worthy for Apache Spark community to take a risk to be >>>>> shutdown. That's the lesson which I've learned last year. >>>>> >>> >>>>> >>> Sincerely, >>>>> >>> Dongjoon. >>>>> >>> >>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote: >>>>> >>>> The gist of the initial 2018 thread was: >>>>> >>>> These are not source .jar files that users use, but .jar files >>>>> used to test >>>>> >>>> loading of from .jar files. These are test resources only. >>>>> >>>> I don't think this is what the spirit of the rule is speaking to, >>>>> that the >>>>> >>>> end-user code should always have source code, which is the right >>>>> principle. >>>>> >>>> Checking in the code somewhere is nice to have though and I think >>>>> that was >>>>> >>>> the idea here. >>>>> >>>> >>>>> >>>> But, removing these and disabling potentially valuable tests >>>>> seems like a >>>>> >>>> step too far. There is no actual 'problem' w.r.t. the principle >>>>> that users >>>>> >>>> have source to the code they run. >>>>> >>>> >>>>> >>>> The 2025 thread just retreads the same ground as the 2018 thread. >>>>> >>>> But I don't see that we put this argument to the person who >>>>> raised it >>>>> >>>> again. Why not that first? >>>>> >>>> And, if possible, go stick the source to these jars in the source >>>>> tree, >>>>> >>>> where available. >>>>> >>>> >>>>> >>>> >>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun < >>>>> dongjoon.h...@gmail.com> >>>>> >>>> wrote: >>>>> >>>> >>>>> >>>>> Hi, All. >>>>> >>>>> >>>>> >>>>> Unfortunately, the Apache Spark project seems to have a >>>>> technical debt in >>>>> >>>>> the source code releases. It happens to be discussed at least >>>>> twice on both >>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank you for the >>>>> head-up, >>>>> >>>>> Vlad.) >>>>> >>>>> >>>>> >>>>> 1. >>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8 >>>>> >>>>> (2018-06-21, dev@spark) >>>>> >>>>> 2. >>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k >>>>> >>>>> (2018-06-25, legal-discuss@) >>>>> >>>>> 3. >>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd >>>>> >>>>> (2025-02-25, dev@spark) >>>>> >>>>> >>>>> >>>>> To be short, according to the previous conclusion in 2018, the >>>>> Apache >>>>> >>>>> Spark community wanted to adhere to the ASF policy by removing >>>>> those jar >>>>> >>>>> files from source code releases (although it was not considered >>>>> as a >>>>> >>>>> release blocker at that time and until now). >>>>> >>>>> >>>>> >>>>>> it's important to be able to recreate these JARs somehow, >>>>> >>>>>> and I don't think we have the source in the repo for all of them >>>>> >>>>>> (at least, the ones that originate from Spark). >>>>> >>>>>> That much seems like a must-do. After that, seems worth >>>>> figuring out >>>>> >>>>>> just how hard it is to build these artifacts from source. >>>>> >>>>>> If it's easy, great. If not, either the test can be removed or >>>>> >>>>>> we figure out just how hard a requirement this is. >>>>> >>>>> >>>>> >>>>> Given the unresolved issue for seven years, I proposed >>>>> SPARK-51318 as a >>>>> >>>>> potential solution to comply with ASF policy. After SPARK-51318, >>>>> we can >>>>> >>>>> recover the test coverage one by one later by addressing IDed >>>>> TODO items >>>>> >>>>> without any legal concerns during the votes. >>>>> >>>>> >>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318 >>>>> >>>>> (Remove `jar` files from Apache Spark repository and disable >>>>> affected >>>>> >>>>> tests) >>>>> >>>>> >>>>> >>>>> WDYT? >>>>> >>>>> >>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as a blocker >>>>> for any >>>>> >>>>> on-going releases yet. >>>>> >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> Dongjoon. >>>>> >>>>> >>>>> >>>> >>>>> >>> >>>>> >>> >>>>> --------------------------------------------------------------------- >>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>> >>>>> >> >>>>> >> >>>>> >> >>>>> --------------------------------------------------------------------- >>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >> >>>>> >> >>>>> > >>>>> > --------------------------------------------------------------------- >>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> > >>>>> >>>>> >>> >>