I still disagree with just disabling tests and removing the jars without
making sure that we will enable them back.
I want to EITHER make sure we have a plan and someone to drive, and the
tests will be enabled back, OR have a one fix that does all.
Otherwise, my -1 stands if we can't be sure of that.

On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org> wrote:

> From what I read in the last discussion in the legal thread (
> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), we
> don't really need to rush and block the release.
> I don't think we should block the release, remove the CI, and just remove
> the jars.
>
> Rozov, the original proposal of this thread is 1. to first disable the
> tests, and 2. open an umbrella JIRA to enable individual tests.
> Since you're driving this, would you mind either making a proper fix in
> one go, or create an umbrella JIRA to drive this?
>
>
> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad <vro...@amazon.com.invalid>
> wrote:
>
>> Let’s open a formal vote on the subject. I have open WIP PR
>> https://github.com/apache/spark/pull/50231 that is currently blocked by
>> -1.
>>
>> Thank you,
>>
>> Vlad
>>
>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote:
>>
>>
>> It seems there’s no quick fix for this issue. Should we remove these jars
>> and disable the tests for now to comply with ASF policy? While this would
>> temporarily reduce test coverage until we refactor the tests to avoid
>> pre-compiled jars, we can encourage Spark vendors not to cherry-pick this
>> test-disabling commit so they can help report any test failures. That said,
>> since these tests are quite old and stable, failures are unlikely.
>>
>> Thanks,
>> Wenchen
>>
>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad <vro...@amazon.com.invalid>
>> wrote:
>>
>>> There is a difference between technical debt and legal issue. ASF may
>>> request to pull out release that does not meet ASF policy (and having tests
>>> is not ASF policy). IMO, SPARK-51318 should be a blocker for the next
>>> release or handled like a blocker.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim <kabhwan.opensou...@gmail.com>
>>> wrote:
>>>
>>> +1 to Hyukjin. If the test is effective, we should definitely retain the
>>> effectiveness of the test, unless we end up with the conclusion that there
>>> is no way to do that.
>>>
>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon <gurwls...@apache.org>
>>> wrote:
>>>
>>>> If we should fix, let's make sure we don't just disable the tests - we
>>>> will create another set of technical debt.
>>>>
>>>>
>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad <vro...@amazon.com.invalid>
>>>> wrote:
>>>>
>>>>> I’ll look into the JIRA. Please assign it to me.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vlad
>>>>>
>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org> wrote:
>>>>> >
>>>>> > +1, Agree to remove the jar files from the Apache Spark repository
>>>>> and disable the affected tests.
>>>>> >
>>>>> > For the current test scenarios that use jar files, I believe we can
>>>>> definitely find a more reasonable testing approach.
>>>>> >
>>>>> > Thanks,
>>>>> > Jie Yang
>>>>> >
>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote:
>>>>> >> +1 on fixing test jars, though the way how it is fixed needs to be
>>>>> discussed, IMO. In the short term removing jars may still be the best
>>>>> option to satisfy ASF legal policy and avoid release removal.
>>>>> >>
>>>>> >> AFAIK, ASF mandates that users and developers have source code that
>>>>> they build from (source release), not that they run (binary release).
>>>>> >>
>>>>> >> Thank you,
>>>>> >>
>>>>> >> Vlad
>>>>> >>
>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <dongj...@apache.org>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Thank you for your reply, Sean.
>>>>> >>>
>>>>> >>> I expected that argument exactly so that I started by quoting your
>>>>> sentence in the above.
>>>>> >>>
>>>>> >>> I understood the reasoning in 2018. However, there are two reasons
>>>>> why I brought this again in 2025:
>>>>> >>>
>>>>> >>> First, the open source sprit is technically and literally "no
>>>>> compiled code in a source release" like Apache Hadoop and Hive community
>>>>> does. Justin, Vlad, and Alex shared the same perspective to the Apache
>>>>> Spark PMC.
>>>>> >>>
>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l
>>>>> >>>      0
>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l
>>>>> >>>      0
>>>>> >>>
>>>>> >>> Second, last year, the open source communities were hit by
>>>>> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where the
>>>>> backdoor was hidden in the test object. I believe most of us are aware of
>>>>> that. At that time, the GitHub repository was disabled. As a member of
>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the Apache Spark
>>>>> repository in 2025. I attached the following link to provide the XZ Utils
>>>>> history explicitly.
>>>>> >>>
>>>>> >>>
>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know
>>>>> >>>
>>>>> >>> Although I agree that those test coverages are important, I don't
>>>>> think that's worthy for Apache Spark community to take a risk to be
>>>>> shutdown. That's the lesson which I've learned last year.
>>>>> >>>
>>>>> >>> Sincerely,
>>>>> >>> Dongjoon.
>>>>> >>>
>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote:
>>>>> >>>> The gist of the initial 2018 thread was:
>>>>> >>>> These are not source .jar files that users use, but .jar files
>>>>> used to test
>>>>> >>>> loading of from .jar files. These are test resources only.
>>>>> >>>> I don't think this is what the spirit of the rule is speaking to,
>>>>> that the
>>>>> >>>> end-user code should always have source code, which is the right
>>>>> principle.
>>>>> >>>> Checking in the code somewhere is nice to have though and I think
>>>>> that was
>>>>> >>>> the idea here.
>>>>> >>>>
>>>>> >>>> But, removing these and disabling potentially valuable tests
>>>>> seems like a
>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the principle
>>>>> that users
>>>>> >>>> have source to the code they run.
>>>>> >>>>
>>>>> >>>> The 2025 thread just retreads the same ground as the 2018 thread.
>>>>> >>>> But I don't see that we put this argument to the person who
>>>>> raised it
>>>>> >>>> again. Why not that first?
>>>>> >>>> And, if possible, go stick the source to these jars in the source
>>>>> tree,
>>>>> >>>> where available.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun <
>>>>> dongjoon.h...@gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>
>>>>> >>>>> Hi, All.
>>>>> >>>>>
>>>>> >>>>> Unfortunately, the Apache Spark project seems to have a
>>>>> technical debt in
>>>>> >>>>> the source code releases. It happens to be discussed at least
>>>>> twice on both
>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank you for the
>>>>> head-up,
>>>>> >>>>> Vlad.)
>>>>> >>>>>
>>>>> >>>>> 1.
>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8
>>>>> >>>>> (2018-06-21, dev@spark)
>>>>> >>>>> 2.
>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k
>>>>> >>>>> (2018-06-25, legal-discuss@)
>>>>> >>>>> 3.
>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd
>>>>> >>>>> (2025-02-25, dev@spark)
>>>>> >>>>>
>>>>> >>>>> To be short, according to the previous conclusion in 2018, the
>>>>> Apache
>>>>> >>>>> Spark community wanted to adhere to the ASF policy by removing
>>>>> those jar
>>>>> >>>>> files from source code releases (although it was not considered
>>>>> as a
>>>>> >>>>> release blocker at that time and until now).
>>>>> >>>>>
>>>>> >>>>>> it's important to be able to recreate these JARs somehow,
>>>>> >>>>>> and I don't think we have the source in the repo for all of them
>>>>> >>>>>> (at least, the ones that originate from Spark).
>>>>> >>>>>> That much seems like a must-do. After that, seems worth
>>>>> figuring out
>>>>> >>>>>> just how hard it is to build these artifacts from source.
>>>>> >>>>>> If it's easy, great. If not, either the test can be removed or
>>>>> >>>>>> we figure out just how hard a requirement this is.
>>>>> >>>>>
>>>>> >>>>> Given the unresolved issue for seven years, I proposed
>>>>> SPARK-51318 as a
>>>>> >>>>> potential solution to comply with ASF policy. After SPARK-51318,
>>>>> we can
>>>>> >>>>> recover the test coverage one by one later by addressing IDed
>>>>> TODO items
>>>>> >>>>> without any legal concerns during the votes.
>>>>> >>>>>
>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318
>>>>> >>>>> (Remove `jar` files from Apache Spark repository and disable
>>>>> affected
>>>>> >>>>> tests)
>>>>> >>>>>
>>>>> >>>>> WDYT?
>>>>> >>>>>
>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as a blocker
>>>>> for any
>>>>> >>>>> on-going releases yet.
>>>>> >>>>>
>>>>> >>>>> Best regards,
>>>>> >>>>> Dongjoon.
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> ---------------------------------------------------------------------
>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> ---------------------------------------------------------------------
>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>> >>
>>>>> >>
>>>>> >
>>>>> > ---------------------------------------------------------------------
>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>> >
>>>>>
>>>>>
>>>
>>

Reply via email to