Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-20 Thread Sean Owen
PS: pull request at https://github.com/apache/spark/pull/23098 Not going to merge it until there's clear agreement. On Tue, Nov 20, 2018 at 10:16 AM Ryan Blue wrote: > > +1 to removing 2.11 support for 3.0 and a PR. > > It sounds like having multiple Scala builds is just not feasible and I don't

Re: Maven

2018-11-20 Thread Sean Owen
Sure, if you published Spark artifacts in a local repo (even your local file system) as com.foo:spark-core_2.12, etc, just depend on those artifacts, not the org.apache ones. On Tue, Nov 20, 2018 at 3:21 PM Jack Kolokasis wrote: > > Hello, > > is there any way to use my local custom - Spark as

Maven

2018-11-20 Thread Jack Kolokasis
Hello,    is there any way to use my local custom - Spark as dependency while I am using maven to compile my applications ? Thanks for your reply, --Iacovos - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread shane knapp
rock on! disabling the old/explicit 2.12 build now. On Tue, Nov 20, 2018 at 10:30 AM Sean Owen wrote: > That's the ticket! yes I'll figure out the build error. > On Tue, Nov 20, 2018 at 11:16 AM shane knapp wrote: > > > > how about this? > > > > > https://amplab.cs.berkeley.edu/jenkins/job/spa

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread Sean Owen
That's the ticket! yes I'll figure out the build error. On Tue, Nov 20, 2018 at 11:16 AM shane knapp wrote: > > how about this? > > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/ > > (log in w/your jenkins creds and look at the build config) > > ba

Numpy memory not being released in executor map-partition function (memory leak)

2018-11-20 Thread joshlk_
I believe I have uncovered a strange interaction between pySpark, Numpy and Python which produces a memory leak. I wonder if anyone has any ideas of what the issue could be? I have the following minimal working example ( gist of code

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread shane knapp
how about this? https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11/ (log in w/your jenkins creds and look at the build config) basically, it runs 'dev/change-scala-version.sh 2.11' and builds w/mvn and '-Pscala-2.11' i'll also disable the spark-maste

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread Sean Owen
Ah right yes not a PR builder. So would you have to update that? If possible to get that in soon it would help detect 2.11 failures. On Tue, Nov 20, 2018, 10:23 AM shane knapp oh, the master builds are in the jenkins job builder configs in that > databricks repo (that's near the top of my TODO li

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread shane knapp
oh, the master builds are in the jenkins job builder configs in that databricks repo (that's near the top of my TODO list to move in to the main spark repo). and btw, the spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12 is *not* a PR builder... ;) On Tue, Nov 20, 2018 at 8:20 AM Sean Owen w

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-20 Thread shane knapp
ok, i think the "how do we, and how many builds for different versions of scala" thing is getting folks confused: 1) we can easily have more than one non-pull-request-builders to test against N versions of scala 2) we have one pull request builder which will test against the root pom, which is n

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread Sean Owen
The one you set up to test 2.12 separately, spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12 Now master is on 2.12 by default. OK will try to change it. On Tue, Nov 20, 2018 at 10:15 AM shane knapp wrote: > > which build are you referring to as "the 2.12 PR builder"? > > but yes, it should jus

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-20 Thread Ryan Blue
+1 to removing 2.11 support for 3.0 and a PR. It sounds like having multiple Scala builds is just not feasible and I don't think this will be too disruptive for users since it is already a breaking change. On Tue, Nov 20, 2018 at 7:05 AM Sean Owen wrote: > One more data point -- from looking at

Re: Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread shane knapp
which build are you referring to as "the 2.12 PR builder"? but yes, it should just be a simple dev/change_scala_version.sh call in the build step. shane On Tue, Nov 20, 2018 at 7:06 AM Sean Owen wrote: > Shane, on your long list of TODOs, we still need to update the 2.12 PR > builder to instea

Can I update the "2.12" PR builder to 2.11?

2018-11-20 Thread Sean Owen
Shane, on your long list of TODOs, we still need to update the 2.12 PR builder to instead test 2.11. Is that just a matter of editing Jenkins configuration that I can see and change? if so I'll just do it. Sean - To unsubscribe e

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-20 Thread Sean Owen
One more data point -- from looking at the SBT build yesterday, it seems like most plugin updates require SBT 1.x. And both they and SBT 1.x seem to need Scala 2.12. And the new zinc also does. Now, the current SBT and zinc and plugins all appear to work OK with 2.12 now, but updating will pretty m

Re: Array indexing functions

2018-11-20 Thread Petar Zečević
Hi, yes, these are imlemented just like native functions in sql.functions, with code generation, so whole-stage codegen should apply. Regarding plan optimization, I am not sure how these would be taken into account in the existing rules, except maybe for filter pushdown. Petar Alessandro So

Re: Array indexing functions

2018-11-20 Thread Alessandro Solimando
Hi Petar, I have implemented similar functions a few times through ad-hoc UDFs in the past, so +1 from me. Can you elaborate a bit more on how you practically implement those functions? Are they UDF or "native" functions like those in sql.functions package? I am asking because I wonder if/how Cat

Array indexing functions

2018-11-20 Thread Petar Zečević
Hi, I implemented two array functions that are useful to us and I wonder if you think it would be useful to add them to the distribution. The functions are used for filtering arrays based on indexes: array_allpositions (named after array_position) - takes a column and a value and returns an a