I’ll look into the JIRA. Please assign it to me.

Thank you,

Vlad

> On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org> wrote:
> 
> +1, Agree to remove the jar files from the Apache Spark repository and 
> disable the affected tests.
> 
> For the current test scenarios that use jar files, I believe we can 
> definitely find a more reasonable testing approach.
> 
> Thanks,
> Jie Yang
> 
> On 2025/02/26 16:57:45 "Rozov, Vlad" wrote:
>> +1 on fixing test jars, though the way how it is fixed needs to be 
>> discussed, IMO. In the short term removing jars may still be the best option 
>> to satisfy ASF legal policy and avoid release removal.
>> 
>> AFAIK, ASF mandates that users and developers have source code that they 
>> build from (source release), not that they run (binary release).
>> 
>> Thank you,
>> 
>> Vlad
>> 
>>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <dongj...@apache.org> wrote:
>>> 
>>> Thank you for your reply, Sean.
>>> 
>>> I expected that argument exactly so that I started by quoting your sentence 
>>> in the above.
>>> 
>>> I understood the reasoning in 2018. However, there are two reasons why I 
>>> brought this again in 2025:
>>> 
>>> First, the open source sprit is technically and literally "no compiled code 
>>> in a source release" like Apache Hadoop and Hive community does. Justin, 
>>> Vlad, and Alex shared the same perspective to the Apache Spark PMC.
>>> 
>>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l
>>>      0
>>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l
>>>      0
>>> 
>>> Second, last year, the open source communities were hit by CVE-2024-3094 
>>> ("XZ Utils Backdoor") in the world-wide manner where the backdoor was 
>>> hidden in the test object. I believe most of us are aware of that. At that 
>>> time, the GitHub repository was disabled. As a member of Apache Spark PMC, 
>>> I'm suggesting to remove that risk from the Apache Spark repository in 
>>> 2025. I attached the following link to provide the XZ Utils history 
>>> explicitly.
>>> 
>>>   
>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know
>>> 
>>> Although I agree that those test coverages are important, I don't think 
>>> that's worthy for Apache Spark community to take a risk to be shutdown. 
>>> That's the lesson which I've learned last year.
>>> 
>>> Sincerely,
>>> Dongjoon.
>>> 
>>> On 2025/02/26 13:31:56 Sean Owen wrote:
>>>> The gist of the initial 2018 thread was:
>>>> These are not source .jar files that users use, but .jar files used to test
>>>> loading of from .jar files. These are test resources only.
>>>> I don't think this is what the spirit of the rule is speaking to, that the
>>>> end-user code should always have source code, which is the right principle.
>>>> Checking in the code somewhere is nice to have though and I think that was
>>>> the idea here.
>>>> 
>>>> But, removing these and disabling potentially valuable tests seems like a
>>>> step too far. There is no actual 'problem' w.r.t. the principle that users
>>>> have source to the code they run.
>>>> 
>>>> The 2025 thread just retreads the same ground as the 2018 thread.
>>>> But I don't see that we put this argument to the person who raised it
>>>> again. Why not that first?
>>>> And, if possible, go stick the source to these jars in the source tree,
>>>> where available.
>>>> 
>>>> 
>>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi, All.
>>>>> 
>>>>> Unfortunately, the Apache Spark project seems to have a technical debt in
>>>>> the source code releases. It happens to be discussed at least twice on 
>>>>> both
>>>>> dev@spark and legal-discuss mailing lists. (Thank you for the head-up,
>>>>> Vlad.)
>>>>> 
>>>>> 1. https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8
>>>>> (2018-06-21, dev@spark)
>>>>> 2. https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k
>>>>> (2018-06-25, legal-discuss@)
>>>>> 3. https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd
>>>>> (2025-02-25, dev@spark)
>>>>> 
>>>>> To be short, according to the previous conclusion in 2018, the Apache
>>>>> Spark community wanted to adhere to the ASF policy by removing those jar
>>>>> files from source code releases (although it was not considered as a
>>>>> release blocker at that time and until now).
>>>>> 
>>>>>> it's important to be able to recreate these JARs somehow,
>>>>>> and I don't think we have the source in the repo for all of them
>>>>>> (at least, the ones that originate from Spark).
>>>>>> That much seems like a must-do. After that, seems worth figuring out
>>>>>> just how hard it is to build these artifacts from source.
>>>>>> If it's easy, great. If not, either the test can be removed or
>>>>>> we figure out just how hard a requirement this is.
>>>>> 
>>>>> Given the unresolved issue for seven years, I proposed SPARK-51318 as a
>>>>> potential solution to comply with ASF policy. After SPARK-51318, we can
>>>>> recover the test coverage one by one later by addressing IDed TODO items
>>>>> without any legal concerns during the votes.
>>>>> 
>>>>> https://issues.apache.org/jira/browse/SPARK-51318
>>>>> (Remove `jar` files from Apache Spark repository and disable affected
>>>>> tests)
>>>>> 
>>>>> WDYT?
>>>>> 
>>>>> BTW, please note that I didn't define SPARK-51318 as a blocker for any
>>>>> on-going releases yet.
>>>>> 
>>>>> Best regards,
>>>>> Dongjoon.
>>>>> 
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 

Reply via email to