While I'd love to resolve this issue, I still don't understand why we would
block the release for this.



On Tue, Mar 25, 2025 at 7:49 AM Rozov, Vlad <vro...@amazon.com.invalid>
wrote:

> The difference is in the way how tests are disabled.
>
> - the approach encourages keeping jars files in the Apache Spark repo
> - it is hard to identify what tests are impacted by jars so they can be
> properly fixed
> - the solution relies on jar being present or not present on the
> classpath. Tests may be skipped unintentionally. It is also very easy to
> introduce new tests that do not skip if jar does not exist. Such test will
> break only during release.
>
> IMO, it is necessary to see if the source code for test jars is available
> or can be reconstructed. If not, it is necessary to see how the
> functionality still can be tested even if jar is not available. If the
> source code is available, to keep the tests it is necessary to build jars
> during tests or publish jars to maven and pull them as the test dependency.
>
> Thank you,
>
> Vlad
>
> On Mar 24, 2025, at 11:52 PM, Hyukjin Kwon <gurwls...@apache.org> wrote:
>
> What's the difference between disabling tests for dev and release vs only
> for release?
>
> On Tue, 25 Mar 2025 at 15:36, Rozov, Vlad <vro...@amazon.com.invalid>
> wrote:
>
>> Overall I don’t buy the solution where tests are skipped based on the
>> presence of a jar file. It looks too fragile to me. What if there is a bug
>> that does not add jar to a classpath? The test would be skipped, but not
>> because jar was deleted, but because classpath is incorrect.
>>
>> Thank you,
>>
>> Vlad
>>
>> On Mar 24, 2025, at 7:56 PM, Hyukjin Kwon <gurwls...@apache.org> wrote:
>>
>> Valid concern. Maybe we can mark tests ignored when those tests do not
>> exist for now. So tagged commit will skip those tests. Dev commits will
>> still test them.
>>
>> On Tue, 25 Mar 2025 at 11:47, Jungtaek Lim <kabhwan.opensou...@gmail.com>
>> wrote:
>>
>>> Maybe we should also check that it is mandatory for source code being
>>> distributed under release to be able to pass the test suites? If this is
>>> mandatory, we can't just modify the release script to simply remove the
>>> jars, because this will break the tests in source code distribution.
>>>
>>> Actually this is my understanding to make sure tests pass from source
>>> code and could build the same artifacts we release from source code, but I
>>> might be wrong.
>>>
>>> On Tue, Mar 25, 2025 at 11:32 AM Hyukjin Kwon <gurwls...@apache.org>
>>> wrote:
>>>
>>>> Made a PR first (https://github.com/apache/spark/pull/50378).
>>>>
>>>> BTW, I agree that we should have the source code along with the jars,
>>>> and ideally the dev branch should not contain them as well. This is a
>>>> technical depth.
>>>> For this, I hope we can improve this incrementally.
>>>>
>>>> I will also take a look and see if we can reject jars automatically in
>>>> PRs or CI.
>>>>
>>>>
>>>> On Tue, 25 Mar 2025 at 11:15, Hyukjin Kwon <gurwls...@apache.org>
>>>> wrote:
>>>>
>>>>> So the issues are source releases (
>>>>> https://github.com/apache/spark/tags) containing those jars, right?
>>>>> Can we add the removal of test jars at the part of the release process.
>>>>>
>>>>> They aren't included in binary releases in any event so removal on
>>>>> every source release should work.
>>>>>
>>>>> On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim <
>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>
>>>>>> Let's make this very clear - do we not have a source code to build a
>>>>>> jar, or have no way to infer the source code being used for the jar?
>>>>>>
>>>>>> I understand the concern, but if this is a huge issue, why no one has
>>>>>> looked into this and here we just debate whether the affected tests need 
>>>>>> to
>>>>>> be dropped/disabled or not? Whenever we add some test resources like a
>>>>>> golden file, we tend to leave the part of the code to build the golden
>>>>>> file. Did we check and confirm these jars are not the case and we lost 
>>>>>> the
>>>>>> source code to build?
>>>>>>
>>>>>> On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad <vro...@amazon.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>>> First of all I don’t think that conclusion on the
>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is
>>>>>>> correct. Jar files included into the source release are compiled from 
>>>>>>> the
>>>>>>> code and replacing them with dat or jpeg files won’t work. Including jar
>>>>>>> files into the source release is against ASF policy and my -1 will stay 
>>>>>>> as
>>>>>>> long as jars are included into the source release. As this issue was 
>>>>>>> raised
>>>>>>> not for the first time and there was no action (actually more jars were
>>>>>>> added), IMO, the issue should now be handled as the release blocker.
>>>>>>>
>>>>>>> I don’t see anything in the proposal that suggests that fix
>>>>>>> for SPARK-51318 is or should be blocked by umbrella JIRA. The proposal 
>>>>>>> was
>>>>>>> to recover tests one by one. The PR that I have open will allow to
>>>>>>> accomplish these tasks as all disabled tests refer to SPARK-51318.
>>>>>>>
>>>>>>> I can only help with SPARK-51318 at this point. Somebody else will
>>>>>>> have to look into keeping tests enabled as it requires source code for 
>>>>>>> the
>>>>>>> test jars.
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Vlad
>>>>>>>
>>>>>>>
>>>>>>> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I still disagree with just disabling tests and removing the jars
>>>>>>> without making sure that we will enable them back.
>>>>>>> I want to EITHER make sure we have a plan and someone to drive, and
>>>>>>> the tests will be enabled back, OR have a one fix that does all.
>>>>>>> Otherwise, my -1 stands if we can't be sure of that.
>>>>>>>
>>>>>>> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> From what I read in the last discussion in the legal thread (
>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k),
>>>>>>>> we don't really need to rush and block the release.
>>>>>>>> I don't think we should block the release, remove the CI, and just
>>>>>>>> remove the jars.
>>>>>>>>
>>>>>>>> Rozov, the original proposal of this thread is 1. to first disable
>>>>>>>> the tests, and 2. open an umbrella JIRA to enable individual tests.
>>>>>>>> Since you're driving this, would you mind either making a proper
>>>>>>>> fix in one go, or create an umbrella JIRA to drive this?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad <vro...@amazon.com.invalid>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Let’s open a formal vote on the subject. I have open WIP PR
>>>>>>>>> https://github.com/apache/spark/pull/50231 that is currently
>>>>>>>>> blocked by -1.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>
>>>>>>>>> Vlad
>>>>>>>>>
>>>>>>>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It seems there’s no quick fix for this issue. Should we remove
>>>>>>>>> these jars and disable the tests for now to comply with ASF policy? 
>>>>>>>>> While
>>>>>>>>> this would temporarily reduce test coverage until we refactor the 
>>>>>>>>> tests to
>>>>>>>>> avoid pre-compiled jars, we can encourage Spark vendors not to 
>>>>>>>>> cherry-pick
>>>>>>>>> this test-disabling commit so they can help report any test failures. 
>>>>>>>>> That
>>>>>>>>> said, since these tests are quite old and stable, failures are 
>>>>>>>>> unlikely.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Wenchen
>>>>>>>>>
>>>>>>>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad
>>>>>>>>> <vro...@amazon.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> There is a difference between technical debt and legal issue. ASF
>>>>>>>>>> may request to pull out release that does not meet ASF policy (and 
>>>>>>>>>> having
>>>>>>>>>> tests is not ASF policy). IMO, SPARK-51318 should be a blocker for 
>>>>>>>>>> the next
>>>>>>>>>> release or handled like a blocker.
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>>
>>>>>>>>>> Vlad
>>>>>>>>>>
>>>>>>>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim <
>>>>>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> +1 to Hyukjin. If the test is effective, we should definitely
>>>>>>>>>> retain the effectiveness of the test, unless we end up with the 
>>>>>>>>>> conclusion
>>>>>>>>>> that there is no way to do that.
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon <
>>>>>>>>>> gurwls...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> If we should fix, let's make sure we don't just disable the
>>>>>>>>>>> tests - we will create another set of technical debt.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad
>>>>>>>>>>> <vro...@amazon.com.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I’ll look into the JIRA. Please assign it to me.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>
>>>>>>>>>>>> Vlad
>>>>>>>>>>>>
>>>>>>>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > +1, Agree to remove the jar files from the Apache Spark
>>>>>>>>>>>> repository and disable the affected tests.
>>>>>>>>>>>> >
>>>>>>>>>>>> > For the current test scenarios that use jar files, I believe
>>>>>>>>>>>> we can definitely find a more reasonable testing approach.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>> > Jie Yang
>>>>>>>>>>>> >
>>>>>>>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote:
>>>>>>>>>>>> >> +1 on fixing test jars, though the way how it is fixed needs
>>>>>>>>>>>> to be discussed, IMO. In the short term removing jars may still be 
>>>>>>>>>>>> the best
>>>>>>>>>>>> option to satisfy ASF legal policy and avoid release removal.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> AFAIK, ASF mandates that users and developers have source
>>>>>>>>>>>> code that they build from (source release), not that they run 
>>>>>>>>>>>> (binary
>>>>>>>>>>>> release).
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Thank you,
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Vlad
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <
>>>>>>>>>>>> dongj...@apache.org> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Thank you for your reply, Sean.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> I expected that argument exactly so that I started by
>>>>>>>>>>>> quoting your sentence in the above.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> I understood the reasoning in 2018. However, there are two
>>>>>>>>>>>> reasons why I brought this again in 2025:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> First, the open source sprit is technically and literally
>>>>>>>>>>>> "no compiled code in a source release" like Apache Hadoop and Hive
>>>>>>>>>>>> community does. Justin, Vlad, and Alex shared the same perspective 
>>>>>>>>>>>> to the
>>>>>>>>>>>> Apache Spark PMC.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l
>>>>>>>>>>>> >>>      0
>>>>>>>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l
>>>>>>>>>>>> >>>      0
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Second, last year, the open source communities were hit by
>>>>>>>>>>>> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where 
>>>>>>>>>>>> the
>>>>>>>>>>>> backdoor was hidden in the test object. I believe most of us are 
>>>>>>>>>>>> aware of
>>>>>>>>>>>> that. At that time, the GitHub repository was disabled. As a 
>>>>>>>>>>>> member of
>>>>>>>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the 
>>>>>>>>>>>> Apache Spark
>>>>>>>>>>>> repository in 2025. I attached the following link to provide the 
>>>>>>>>>>>> XZ Utils
>>>>>>>>>>>> history explicitly.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Although I agree that those test coverages are important, I
>>>>>>>>>>>> don't think that's worthy for Apache Spark community to take a 
>>>>>>>>>>>> risk to be
>>>>>>>>>>>> shutdown. That's the lesson which I've learned last year.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Sincerely,
>>>>>>>>>>>> >>> Dongjoon.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote:
>>>>>>>>>>>> >>>> The gist of the initial 2018 thread was:
>>>>>>>>>>>> >>>> These are not source .jar files that users use, but .jar
>>>>>>>>>>>> files used to test
>>>>>>>>>>>> >>>> loading of from .jar files. These are test resources only.
>>>>>>>>>>>> >>>> I don't think this is what the spirit of the rule is
>>>>>>>>>>>> speaking to, that the
>>>>>>>>>>>> >>>> end-user code should always have source code, which is the
>>>>>>>>>>>> right principle.
>>>>>>>>>>>> >>>> Checking in the code somewhere is nice to have though and
>>>>>>>>>>>> I think that was
>>>>>>>>>>>> >>>> the idea here.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> But, removing these and disabling potentially valuable
>>>>>>>>>>>> tests seems like a
>>>>>>>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the
>>>>>>>>>>>> principle that users
>>>>>>>>>>>> >>>> have source to the code they run.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> The 2025 thread just retreads the same ground as the 2018
>>>>>>>>>>>> thread.
>>>>>>>>>>>> >>>> But I don't see that we put this argument to the person
>>>>>>>>>>>> who raised it
>>>>>>>>>>>> >>>> again. Why not that first?
>>>>>>>>>>>> >>>> And, if possible, go stick the source to these jars in the
>>>>>>>>>>>> source tree,
>>>>>>>>>>>> >>>> where available.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun <
>>>>>>>>>>>> dongjoon.h...@gmail.com>
>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>> Hi, All.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Unfortunately, the Apache Spark project seems to have a
>>>>>>>>>>>> technical debt in
>>>>>>>>>>>> >>>>> the source code releases. It happens to be discussed at
>>>>>>>>>>>> least twice on both
>>>>>>>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank you
>>>>>>>>>>>> for the head-up,
>>>>>>>>>>>> >>>>> Vlad.)
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> 1.
>>>>>>>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8
>>>>>>>>>>>> >>>>> (2018-06-21, dev@spark)
>>>>>>>>>>>> >>>>> 2.
>>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k
>>>>>>>>>>>> >>>>> (2018-06-25, legal-discuss@)
>>>>>>>>>>>> >>>>> 3.
>>>>>>>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd
>>>>>>>>>>>> >>>>> (2025-02-25, dev@spark)
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> To be short, according to the previous conclusion in
>>>>>>>>>>>> 2018, the Apache
>>>>>>>>>>>> >>>>> Spark community wanted to adhere to the ASF policy by
>>>>>>>>>>>> removing those jar
>>>>>>>>>>>> >>>>> files from source code releases (although it was not
>>>>>>>>>>>> considered as a
>>>>>>>>>>>> >>>>> release blocker at that time and until now).
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>>> it's important to be able to recreate these JARs somehow,
>>>>>>>>>>>> >>>>>> and I don't think we have the source in the repo for all
>>>>>>>>>>>> of them
>>>>>>>>>>>> >>>>>> (at least, the ones that originate from Spark).
>>>>>>>>>>>> >>>>>> That much seems like a must-do. After that, seems worth
>>>>>>>>>>>> figuring out
>>>>>>>>>>>> >>>>>> just how hard it is to build these artifacts from source.
>>>>>>>>>>>> >>>>>> If it's easy, great. If not, either the test can be
>>>>>>>>>>>> removed or
>>>>>>>>>>>> >>>>>> we figure out just how hard a requirement this is.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Given the unresolved issue for seven years, I proposed
>>>>>>>>>>>> SPARK-51318 as a
>>>>>>>>>>>> >>>>> potential solution to comply with ASF policy. After
>>>>>>>>>>>> SPARK-51318, we can
>>>>>>>>>>>> >>>>> recover the test coverage one by one later by addressing
>>>>>>>>>>>> IDed TODO items
>>>>>>>>>>>> >>>>> without any legal concerns during the votes.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318
>>>>>>>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and
>>>>>>>>>>>> disable affected
>>>>>>>>>>>> >>>>> tests)
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> WDYT?
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as a
>>>>>>>>>>>> blocker for any
>>>>>>>>>>>> >>>>> on-going releases yet.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Best regards,
>>>>>>>>>>>> >>>>> Dongjoon.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>
>

Reply via email to