Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Krishna Sankar
Excellent & Thanks Davies. Yep, now runs fine and takes 1/2 the time ! This was exactly why I had put in the elapsed time calculations. And thanks for the new pyspark.sql.functions. +1 from my side for 1.5.0 RC3. Cheers On Fri, Sep 4, 2015 at 9:57 PM, Davies Liu wrote: > Could you update the n

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Davies Liu
Could you update the notebook to use builtin SQL function month and year, instead of Python UDF? (they are introduced in 1.5). Once remove those two udfs, it runs successfully, also much faster. On Fri, Sep 4, 2015 at 2:22 PM, Krishna Sankar wrote: > Yin, >It is the > https://github.com/xsan

Re: (Spark SQL) partition-scoped UDF

2015-09-04 Thread Reynold Xin
Can you say more about your transformer? This is a good idea, and indeed we are doing it for R already (the latest way to run UDFs in R is to pass the entire partition as a local R dataframe for users to run on). However, what works for R for simple data processing might not work for your high per

Re: Flaky test in DAGSchedulerSuite?

2015-09-04 Thread Andrew Or
(merge into master, thanks for the quick fix Pete). 2015-09-04 15:58 GMT-07:00 Cheolsoo Park : > Thank you Pete! > > On Fri, Sep 4, 2015 at 1:40 PM, Pete Robbins wrote: > >> raised https://issues.apache.org/jira/browse/SPARK-10454 and PR >> >> On 4 September 2015 at 21:24, Pete Robbins wrote: >

Re: Flaky test in DAGSchedulerSuite?

2015-09-04 Thread Cheolsoo Park
Thank you Pete! On Fri, Sep 4, 2015 at 1:40 PM, Pete Robbins wrote: > raised https://issues.apache.org/jira/browse/SPARK-10454 and PR > > On 4 September 2015 at 21:24, Pete Robbins wrote: > >> I've also just hit this and was about to raise a JIRA for this if there >> isn't one already. I have a

[build system] java package updates on the amplab jenkins workers

2015-09-04 Thread shane knapp
i've installed the latest java 7 and 8 packages on all of the jenkins workers! i haven't updated the /usr/java/latest and /usr/java/default symlinks to point to the new java 7 package, as i'd like to wait for downtime when no builds are running. switching java versions mid-build might be fun, but

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Krishna Sankar
Yin, It is the https://github.com/xsankar/global-bd-conf/blob/master/004-Orders.ipynb. Cheers On Fri, Sep 4, 2015 at 9:58 AM, Yin Huai wrote: > Hi Krishna, > > Can you share your code to reproduce the memory allocation issue? > > Thanks, > > Yin > > On Fri, Sep 4, 2015 at 8:00 AM, Krishna Sa

Re: Flaky test in DAGSchedulerSuite?

2015-09-04 Thread Pete Robbins
raised https://issues.apache.org/jira/browse/SPARK-10454 and PR On 4 September 2015 at 21:24, Pete Robbins wrote: > I've also just hit this and was about to raise a JIRA for this if there > isn't one already. I have a simple fix. > > On 4 September 2015 at 19:09, Cheolsoo Park wrote: > >> Hi de

Re: Flaky test in DAGSchedulerSuite?

2015-09-04 Thread Pete Robbins
I've also just hit this and was about to raise a JIRA for this if there isn't one already. I have a simple fix. On 4 September 2015 at 19:09, Cheolsoo Park wrote: > Hi devs, > > I noticed this test case fails intermittently in Jenkins. > > For eg, see the following builds- > https://amplab.cs.be

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Reynold Xin
Krishna - I think the rename happened before rc1 actually. Was done couple months ago. On Fri, Sep 4, 2015 at 5:00 AM, Krishna Sankar wrote: > Thanks Tom. Interestingly it happened between RC2 and RC3. > Now my vote is +1/2 unless the memory error is known and has a workaround. > > Cheers > >

Flaky test in DAGSchedulerSuite?

2015-09-04 Thread Cheolsoo Park
Hi devs, I noticed this test case fails intermittently in Jenkins. For eg, see the following builds- https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41991/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41999/ The test failed in different PRs, and the failu

(Spark SQL) partition-scoped UDF

2015-09-04 Thread Eron Wright
Transformers in Spark ML typically operate on a per-row basis, based on callUDF. For a new transformer that I'm developing, I have a need to transform an entire partition with a function, as opposed to transforming each row separately. The reason is that, in my case, rows must be transformed i

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Yin Huai
Hi Krishna, Can you share your code to reproduce the memory allocation issue? Thanks, Yin On Fri, Sep 4, 2015 at 8:00 AM, Krishna Sankar wrote: > Thanks Tom. Interestingly it happened between RC2 and RC3. > Now my vote is +1/2 unless the memory error is known and has a workaround. > > Cheers

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Krishna Sankar
Thanks Tom. Interestingly it happened between RC2 and RC3. Now my vote is +1/2 unless the memory error is known and has a workaround. Cheers On Fri, Sep 4, 2015 at 7:30 AM, Tom Graves wrote: > The upper/lower case thing is known. > https://issues.apache.org/jira/browse/SPARK-9550 > I assume

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-04 Thread Tom Graves
The upper/lower case thing is known.   https://issues.apache.org/jira/browse/SPARK-9550I assume it was decided to be ok and its going to be in the release notes  but Reynold or Josh can probably speak to it more. Tom On Thursday, September 3, 2015 10:21 PM, Krishna Sankar wrote: +

Re: OOM in spark driver

2015-09-04 Thread Akhil Das
Or you can increase the driver heap space (export _JAVA_OPTIONS="-Xmx5g") Thanks Best Regards On Wed, Sep 2, 2015 at 11:57 PM, Mike Hynes <91m...@gmail.com> wrote: > Just a thought; this has worked for me before on standalone client > with a similar OOM error in a driver thread. Try setting: > e