Sure it is a good idea, but not sure Spark can enforce it? even a documented suggestion probably isn't going to be noticed. FooBar can put code under org.apache.spark.foobar, ideally, I guess.
On Wed, Sep 23, 2020 at 8:01 AM Steve Loughran <ste...@cloudera.com.invalid> wrote: > > the issue is that sometimes people explicitly want to put stuff into the > spark package tree just to get at things which spark scoped as > org.apache.spark. Unless/Until the relevant APIs/classes are rescoped to be > public, putting your classes under the package hierarchy lets your own code > at it. It just confuses stack trace analysis as it's not immediately obvious > whose code is playing up. > > > > On Tue, 22 Sep 2020 at 04:03, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: >> >> Hi, Steve. >> >> Sure, you can suggest, but I'm wondering how the suggested namespaces are >> able to satisfy the existing visibility rules. Could you give us some >> examples specifically? >> >> > Can I suggest some common prefix for third-party-classes put into the >> > spark package tree, just to make clear that they are external >> > contributions? >> >> Bests, >> Dongjoon. >> >> >> On Mon, Sep 21, 2020 at 6:29 AM Steve Loughran <ste...@cloudera.com.invalid> >> wrote: >>> >>> >>> I've just been stack-trace-chasing the 404-in-task-commit code: >>> >>> https://issues.apache.org/jira/browse/HADOOP-17216 >>> >>> And although it's got an org.apache.spark. prefix, it's actually >>> org.apache.spark.sql.delta, which lives in github, so the code/issue >>> tracker lives elsewhere. >>> >>> I understand why they've done this -I've done it myself- it's to get a >>> classes package-scoped to spark >>> (https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala) >>> >>> however, it can be confusing and time wasting >>> >>> Can I suggest some common prefix for third-party-classes put into the spark >>> package tree, just to make clear that they are external contributions? It >>> will set expectations up all round >>> >>> -Steve >>> >>> (*) Side node: Could whoever maintains that code do retries, which have to >>> have sleeps of >10-15s? We ended up having to do exponental backoff of > >>> 90s to make sure the load balancers were clean. The time for a 404 to clear >>> is not "time since file was added", it is "time since last HEAD/GET/COPY >>> request". thx --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org