What's the difference between disabling tests for dev and release vs only
for release?

On Tue, 25 Mar 2025 at 15:36, Rozov, Vlad <vro...@amazon.com.invalid> wrote:

> Overall I don’t buy the solution where tests are skipped based on the
> presence of a jar file. It looks too fragile to me. What if there is a bug
> that does not add jar to a classpath? The test would be skipped, but not
> because jar was deleted, but because classpath is incorrect.
>
> Thank you,
>
> Vlad
>
> On Mar 24, 2025, at 7:56 PM, Hyukjin Kwon <gurwls...@apache.org> wrote:
>
> Valid concern. Maybe we can mark tests ignored when those tests do not
> exist for now. So tagged commit will skip those tests. Dev commits will
> still test them.
>
> On Tue, 25 Mar 2025 at 11:47, Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
>> Maybe we should also check that it is mandatory for source code being
>> distributed under release to be able to pass the test suites? If this is
>> mandatory, we can't just modify the release script to simply remove the
>> jars, because this will break the tests in source code distribution.
>>
>> Actually this is my understanding to make sure tests pass from source
>> code and could build the same artifacts we release from source code, but I
>> might be wrong.
>>
>> On Tue, Mar 25, 2025 at 11:32 AM Hyukjin Kwon <gurwls...@apache.org>
>> wrote:
>>
>>> Made a PR first (https://github.com/apache/spark/pull/50378).
>>>
>>> BTW, I agree that we should have the source code along with the jars,
>>> and ideally the dev branch should not contain them as well. This is a
>>> technical depth.
>>> For this, I hope we can improve this incrementally.
>>>
>>> I will also take a look and see if we can reject jars automatically in
>>> PRs or CI.
>>>
>>>
>>> On Tue, 25 Mar 2025 at 11:15, Hyukjin Kwon <gurwls...@apache.org> wrote:
>>>
>>>> So the issues are source releases (https://github.com/apache/spark/tags)
>>>> containing those jars, right? Can we add the removal of test jars at the
>>>> part of the release process.
>>>>
>>>> They aren't included in binary releases in any event so removal on
>>>> every source release should work.
>>>>
>>>> On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim <
>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>
>>>>> Let's make this very clear - do we not have a source code to build a
>>>>> jar, or have no way to infer the source code being used for the jar?
>>>>>
>>>>> I understand the concern, but if this is a huge issue, why no one has
>>>>> looked into this and here we just debate whether the affected tests need 
>>>>> to
>>>>> be dropped/disabled or not? Whenever we add some test resources like a
>>>>> golden file, we tend to leave the part of the code to build the golden
>>>>> file. Did we check and confirm these jars are not the case and we lost the
>>>>> source code to build?
>>>>>
>>>>> On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad <vro...@amazon.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> First of all I don’t think that conclusion on the
>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is
>>>>>> correct. Jar files included into the source release are compiled from the
>>>>>> code and replacing them with dat or jpeg files won’t work. Including jar
>>>>>> files into the source release is against ASF policy and my -1 will stay 
>>>>>> as
>>>>>> long as jars are included into the source release. As this issue was 
>>>>>> raised
>>>>>> not for the first time and there was no action (actually more jars were
>>>>>> added), IMO, the issue should now be handled as the release blocker.
>>>>>>
>>>>>> I don’t see anything in the proposal that suggests that fix
>>>>>> for SPARK-51318 is or should be blocked by umbrella JIRA. The proposal 
>>>>>> was
>>>>>> to recover tests one by one. The PR that I have open will allow to
>>>>>> accomplish these tasks as all disabled tests refer to SPARK-51318.
>>>>>>
>>>>>> I can only help with SPARK-51318 at this point. Somebody else will
>>>>>> have to look into keeping tests enabled as it requires source code for 
>>>>>> the
>>>>>> test jars.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>>
>>>>>> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> I still disagree with just disabling tests and removing the jars
>>>>>> without making sure that we will enable them back.
>>>>>> I want to EITHER make sure we have a plan and someone to drive, and
>>>>>> the tests will be enabled back, OR have a one fix that does all.
>>>>>> Otherwise, my -1 stands if we can't be sure of that.
>>>>>>
>>>>>> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> From what I read in the last discussion in the legal thread (
>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k),
>>>>>>> we don't really need to rush and block the release.
>>>>>>> I don't think we should block the release, remove the CI, and just
>>>>>>> remove the jars.
>>>>>>>
>>>>>>> Rozov, the original proposal of this thread is 1. to first disable
>>>>>>> the tests, and 2. open an umbrella JIRA to enable individual tests.
>>>>>>> Since you're driving this, would you mind either making a proper fix
>>>>>>> in one go, or create an umbrella JIRA to drive this?
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad <vro...@amazon.com.invalid>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Let’s open a formal vote on the subject. I have open WIP PR
>>>>>>>> https://github.com/apache/spark/pull/50231 that is currently
>>>>>>>> blocked by -1.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>>
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> It seems there’s no quick fix for this issue. Should we remove
>>>>>>>> these jars and disable the tests for now to comply with ASF policy? 
>>>>>>>> While
>>>>>>>> this would temporarily reduce test coverage until we refactor the 
>>>>>>>> tests to
>>>>>>>> avoid pre-compiled jars, we can encourage Spark vendors not to 
>>>>>>>> cherry-pick
>>>>>>>> this test-disabling commit so they can help report any test failures. 
>>>>>>>> That
>>>>>>>> said, since these tests are quite old and stable, failures are 
>>>>>>>> unlikely.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Wenchen
>>>>>>>>
>>>>>>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad
>>>>>>>> <vro...@amazon.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> There is a difference between technical debt and legal issue. ASF
>>>>>>>>> may request to pull out release that does not meet ASF policy (and 
>>>>>>>>> having
>>>>>>>>> tests is not ASF policy). IMO, SPARK-51318 should be a blocker for 
>>>>>>>>> the next
>>>>>>>>> release or handled like a blocker.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>
>>>>>>>>> Vlad
>>>>>>>>>
>>>>>>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim <
>>>>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> +1 to Hyukjin. If the test is effective, we should definitely
>>>>>>>>> retain the effectiveness of the test, unless we end up with the 
>>>>>>>>> conclusion
>>>>>>>>> that there is no way to do that.
>>>>>>>>>
>>>>>>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon <gurwls...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> If we should fix, let's make sure we don't just disable the tests
>>>>>>>>>> - we will create another set of technical debt.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad
>>>>>>>>>> <vro...@amazon.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> I’ll look into the JIRA. Please assign it to me.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>>
>>>>>>>>>>> Vlad
>>>>>>>>>>>
>>>>>>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > +1, Agree to remove the jar files from the Apache Spark
>>>>>>>>>>> repository and disable the affected tests.
>>>>>>>>>>> >
>>>>>>>>>>> > For the current test scenarios that use jar files, I believe
>>>>>>>>>>> we can definitely find a more reasonable testing approach.
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks,
>>>>>>>>>>> > Jie Yang
>>>>>>>>>>> >
>>>>>>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote:
>>>>>>>>>>> >> +1 on fixing test jars, though the way how it is fixed needs
>>>>>>>>>>> to be discussed, IMO. In the short term removing jars may still be 
>>>>>>>>>>> the best
>>>>>>>>>>> option to satisfy ASF legal policy and avoid release removal.
>>>>>>>>>>> >>
>>>>>>>>>>> >> AFAIK, ASF mandates that users and developers have source
>>>>>>>>>>> code that they build from (source release), not that they run 
>>>>>>>>>>> (binary
>>>>>>>>>>> release).
>>>>>>>>>>> >>
>>>>>>>>>>> >> Thank you,
>>>>>>>>>>> >>
>>>>>>>>>>> >> Vlad
>>>>>>>>>>> >>
>>>>>>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <
>>>>>>>>>>> dongj...@apache.org> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Thank you for your reply, Sean.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I expected that argument exactly so that I started by
>>>>>>>>>>> quoting your sentence in the above.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I understood the reasoning in 2018. However, there are two
>>>>>>>>>>> reasons why I brought this again in 2025:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> First, the open source sprit is technically and literally
>>>>>>>>>>> "no compiled code in a source release" like Apache Hadoop and Hive
>>>>>>>>>>> community does. Justin, Vlad, and Alex shared the same perspective 
>>>>>>>>>>> to the
>>>>>>>>>>> Apache Spark PMC.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l
>>>>>>>>>>> >>>      0
>>>>>>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l
>>>>>>>>>>> >>>      0
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Second, last year, the open source communities were hit by
>>>>>>>>>>> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where 
>>>>>>>>>>> the
>>>>>>>>>>> backdoor was hidden in the test object. I believe most of us are 
>>>>>>>>>>> aware of
>>>>>>>>>>> that. At that time, the GitHub repository was disabled. As a member 
>>>>>>>>>>> of
>>>>>>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the 
>>>>>>>>>>> Apache Spark
>>>>>>>>>>> repository in 2025. I attached the following link to provide the XZ 
>>>>>>>>>>> Utils
>>>>>>>>>>> history explicitly.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Although I agree that those test coverages are important, I
>>>>>>>>>>> don't think that's worthy for Apache Spark community to take a risk 
>>>>>>>>>>> to be
>>>>>>>>>>> shutdown. That's the lesson which I've learned last year.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Sincerely,
>>>>>>>>>>> >>> Dongjoon.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote:
>>>>>>>>>>> >>>> The gist of the initial 2018 thread was:
>>>>>>>>>>> >>>> These are not source .jar files that users use, but .jar
>>>>>>>>>>> files used to test
>>>>>>>>>>> >>>> loading of from .jar files. These are test resources only.
>>>>>>>>>>> >>>> I don't think this is what the spirit of the rule is
>>>>>>>>>>> speaking to, that the
>>>>>>>>>>> >>>> end-user code should always have source code, which is the
>>>>>>>>>>> right principle.
>>>>>>>>>>> >>>> Checking in the code somewhere is nice to have though and I
>>>>>>>>>>> think that was
>>>>>>>>>>> >>>> the idea here.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> But, removing these and disabling potentially valuable
>>>>>>>>>>> tests seems like a
>>>>>>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the
>>>>>>>>>>> principle that users
>>>>>>>>>>> >>>> have source to the code they run.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> The 2025 thread just retreads the same ground as the 2018
>>>>>>>>>>> thread.
>>>>>>>>>>> >>>> But I don't see that we put this argument to the person who
>>>>>>>>>>> raised it
>>>>>>>>>>> >>>> again. Why not that first?
>>>>>>>>>>> >>>> And, if possible, go stick the source to these jars in the
>>>>>>>>>>> source tree,
>>>>>>>>>>> >>>> where available.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun <
>>>>>>>>>>> dongjoon.h...@gmail.com>
>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>> Hi, All.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Unfortunately, the Apache Spark project seems to have a
>>>>>>>>>>> technical debt in
>>>>>>>>>>> >>>>> the source code releases. It happens to be discussed at
>>>>>>>>>>> least twice on both
>>>>>>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank you for
>>>>>>>>>>> the head-up,
>>>>>>>>>>> >>>>> Vlad.)
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> 1.
>>>>>>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8
>>>>>>>>>>> >>>>> (2018-06-21, dev@spark)
>>>>>>>>>>> >>>>> 2.
>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k
>>>>>>>>>>> >>>>> (2018-06-25, legal-discuss@)
>>>>>>>>>>> >>>>> 3.
>>>>>>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd
>>>>>>>>>>> >>>>> (2025-02-25, dev@spark)
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> To be short, according to the previous conclusion in 2018,
>>>>>>>>>>> the Apache
>>>>>>>>>>> >>>>> Spark community wanted to adhere to the ASF policy by
>>>>>>>>>>> removing those jar
>>>>>>>>>>> >>>>> files from source code releases (although it was not
>>>>>>>>>>> considered as a
>>>>>>>>>>> >>>>> release blocker at that time and until now).
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>> it's important to be able to recreate these JARs somehow,
>>>>>>>>>>> >>>>>> and I don't think we have the source in the repo for all
>>>>>>>>>>> of them
>>>>>>>>>>> >>>>>> (at least, the ones that originate from Spark).
>>>>>>>>>>> >>>>>> That much seems like a must-do. After that, seems worth
>>>>>>>>>>> figuring out
>>>>>>>>>>> >>>>>> just how hard it is to build these artifacts from source.
>>>>>>>>>>> >>>>>> If it's easy, great. If not, either the test can be
>>>>>>>>>>> removed or
>>>>>>>>>>> >>>>>> we figure out just how hard a requirement this is.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Given the unresolved issue for seven years, I proposed
>>>>>>>>>>> SPARK-51318 as a
>>>>>>>>>>> >>>>> potential solution to comply with ASF policy. After
>>>>>>>>>>> SPARK-51318, we can
>>>>>>>>>>> >>>>> recover the test coverage one by one later by addressing
>>>>>>>>>>> IDed TODO items
>>>>>>>>>>> >>>>> without any legal concerns during the votes.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318
>>>>>>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and
>>>>>>>>>>> disable affected
>>>>>>>>>>> >>>>> tests)
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> WDYT?
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as a
>>>>>>>>>>> blocker for any
>>>>>>>>>>> >>>>> on-going releases yet.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Best regards,
>>>>>>>>>>> >>>>> Dongjoon.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>> >>>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>

Reply via email to