Re: A common naming policy for third-party packages/modules under org.apache.spark?

Sean Owen Wed, 23 Sep 2020 06:06:04 -0700

Sure it is a good idea, but not sure Spark can enforce it? even a
documented suggestion probably isn't going to be noticed.
FooBar can put code under org.apache.spark.foobar, ideally, I guess.


On Wed, Sep 23, 2020 at 8:01 AM Steve Loughran
<ste...@cloudera.com.invalid> wrote:
>
> the issue is that sometimes people explicitly want to put stuff into the 
> spark package tree just to get at things which spark scoped as 
> org.apache.spark. Unless/Until the relevant APIs/classes are rescoped to be 
> public, putting your classes under the package hierarchy lets your own code 
> at it. It just confuses stack trace analysis as it's not immediately obvious 
> whose code is playing up.
>
>
>
> On Tue, 22 Sep 2020 at 04:03, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:
>>
>> Hi, Steve.
>>
>> Sure, you can suggest, but I'm wondering how the suggested namespaces are 
>> able to satisfy the existing visibility rules. Could you give us some 
>> examples specifically?
>>
>> > Can I suggest some common prefix for third-party-classes put into the 
>> > spark package tree, just to make clear that they are external 
>> > contributions?
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Mon, Sep 21, 2020 at 6:29 AM Steve Loughran <ste...@cloudera.com.invalid> 
>> wrote:
>>>
>>>
>>> I've just been stack-trace-chasing the 404-in-task-commit code:
>>>
>>> https://issues.apache.org/jira/browse/HADOOP-17216
>>>
>>> And although it's got an org.apache.spark. prefix, it's actually 
>>> org.apache.spark.sql.delta, which lives in github, so the code/issue 
>>> tracker lives elsewhere.
>>>
>>> I understand why they've done this -I've done it myself- it's to get a 
>>> classes package-scoped to spark 
>>> (https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala)
>>>
>>> however, it can be confusing and time wasting
>>>
>>> Can I suggest some common prefix for third-party-classes put into the spark 
>>> package tree, just to make clear that they are external contributions? It 
>>> will set expectations up all round
>>>
>>> -Steve
>>>
>>> (*) Side node: Could whoever maintains that code do retries, which have to 
>>> have sleeps of >10-15s? We ended up having to do exponental backoff of > 
>>> 90s to make sure the load balancers were clean. The time for a 404 to clear 
>>> is not "time since file was added", it is "time since last HEAD/GET/COPY 
>>> request". thx

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: A common naming policy for third-party packages/modules under org.apache.spark?

Reply via email to