+1 on fixing test jars, though the way how it is fixed needs to be discussed, 
IMO. In the short term removing jars may still be the best option to satisfy 
ASF legal policy and avoid release removal.

AFAIK, ASF mandates that users and developers have source code that they build 
from (source release), not that they run (binary release).

Thank you,

Vlad

> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <dongj...@apache.org> wrote:
> 
> Thank you for your reply, Sean.
> 
> I expected that argument exactly so that I started by quoting your sentence 
> in the above.
> 
> I understood the reasoning in 2018. However, there are two reasons why I 
> brought this again in 2025:
> 
> First, the open source sprit is technically and literally "no compiled code 
> in a source release" like Apache Hadoop and Hive community does. Justin, 
> Vlad, and Alex shared the same perspective to the Apache Spark PMC.
> 
>  $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l
>       0
>  $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l
>       0
> 
> Second, last year, the open source communities were hit by CVE-2024-3094 ("XZ 
> Utils Backdoor") in the world-wide manner where the backdoor was hidden in 
> the test object. I believe most of us are aware of that. At that time, the 
> GitHub repository was disabled. As a member of Apache Spark PMC, I'm 
> suggesting to remove that risk from the Apache Spark repository in 2025. I 
> attached the following link to provide the XZ Utils history explicitly.
> 
>    
> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know
> 
> Although I agree that those test coverages are important, I don't think 
> that's worthy for Apache Spark community to take a risk to be shutdown. 
> That's the lesson which I've learned last year.
> 
> Sincerely,
> Dongjoon.
> 
> On 2025/02/26 13:31:56 Sean Owen wrote:
>> The gist of the initial 2018 thread was:
>> These are not source .jar files that users use, but .jar files used to test
>> loading of from .jar files. These are test resources only.
>> I don't think this is what the spirit of the rule is speaking to, that the
>> end-user code should always have source code, which is the right principle.
>> Checking in the code somewhere is nice to have though and I think that was
>> the idea here.
>> 
>> But, removing these and disabling potentially valuable tests seems like a
>> step too far. There is no actual 'problem' w.r.t. the principle that users
>> have source to the code they run.
>> 
>> The 2025 thread just retreads the same ground as the 2018 thread.
>> But I don't see that we put this argument to the person who raised it
>> again. Why not that first?
>> And, if possible, go stick the source to these jars in the source tree,
>> where available.
>> 
>> 
>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>> 
>>> Hi, All.
>>> 
>>> Unfortunately, the Apache Spark project seems to have a technical debt in
>>> the source code releases. It happens to be discussed at least twice on both
>>> dev@spark and legal-discuss mailing lists. (Thank you for the head-up,
>>> Vlad.)
>>> 
>>> 1. https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8
>>> (2018-06-21, dev@spark)
>>> 2. https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k
>>> (2018-06-25, legal-discuss@)
>>> 3. https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd
>>> (2025-02-25, dev@spark)
>>> 
>>> To be short, according to the previous conclusion in 2018, the Apache
>>> Spark community wanted to adhere to the ASF policy by removing those jar
>>> files from source code releases (although it was not considered as a
>>> release blocker at that time and until now).
>>> 
>>>> it's important to be able to recreate these JARs somehow,
>>>> and I don't think we have the source in the repo for all of them
>>>> (at least, the ones that originate from Spark).
>>>> That much seems like a must-do. After that, seems worth figuring out
>>>> just how hard it is to build these artifacts from source.
>>>> If it's easy, great. If not, either the test can be removed or
>>>> we figure out just how hard a requirement this is.
>>> 
>>> Given the unresolved issue for seven years, I proposed SPARK-51318 as a
>>> potential solution to comply with ASF policy. After SPARK-51318, we can
>>> recover the test coverage one by one later by addressing IDed TODO items
>>> without any legal concerns during the votes.
>>> 
>>> https://issues.apache.org/jira/browse/SPARK-51318
>>> (Remove `jar` files from Apache Spark repository and disable affected
>>> tests)
>>> 
>>> WDYT?
>>> 
>>> BTW, please note that I didn't define SPARK-51318 as a blocker for any
>>> on-going releases yet.
>>> 
>>> Best regards,
>>> Dongjoon.
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to