Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-27 Thread Patrick Wendell
Hi James, As I said before that is not a blocker issue for this release, thanks. Separately, there are some comments in this code review that indicate you may be facing a bug in your own code rather than with Spark: https://github.com/apache/spark/pull/5688#issuecomment-104491410 Please follow u

Re: SparkR and RDDs

2015-05-27 Thread Andrew Psaltis
Hi Shivaram, Thanks for the details, it is greatly appreciated. Thanks On Wed, May 27, 2015 at 7:25 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Sorry for the delay in getting back on this. So the RDD interface is > private in the 1.4 release but as Alek mentioned you can sti

Re: Available Functions in SparkR

2015-05-27 Thread Shivaram Venkataraman
For the 1.4 release the DataFrame API will be publicly available and the documentation at http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-docs/sql-programming-guide.html (Click on the R tab) provides a good summary of the available functions. As I described in the other email to

Re: SparkR and RDDs

2015-05-27 Thread Shivaram Venkataraman
Sorry for the delay in getting back on this. So the RDD interface is private in the 1.4 release but as Alek mentioned you can still use it by prefixing `SparkR:::`. Regarding design direction -- there are two JIRAs which cover major features we plan to work on for 1.5. SPARK-6805 tracks porting hi

Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-27 Thread jameszhouyi
-1 , SPARK-7119 blocker issue -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-4-0-RC2-tp12420p12472.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. ---

[build system] jenkins downtime tomorrow morning ~730am PDT

2015-05-27 Thread shane knapp
i'm going to be performing system, jenkins, and plugin updates tomorrow morning beginning at 730am PDT. 0700: pause build queue 0800: kill off any errant jobs (retrigger when everything comes back up) 0800-0900: system and plugin updates 0900-1000: final debugging, roll back versions of plugin

RE: Spark 1.4.0 pyspark and pylint breaking

2015-05-27 Thread Michael Nazario
I've done some investigation into what work needed to be done to keep the _types module named types. This isn't a relative / absolute path problem, but actually a problem with the way the tests were run. I've filed a jira ticket on it here: https://issues.apache.org/jira/browse/SPARK-7899) I a

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-27 Thread Nitin Goyal
Hi Ted, Thanks a lot for replying. First of all, moving to 1.4.0 RC2 is not easy for us as migration cost is big since lot has changed in Spark SQL since 1.2. Regarding SPARK-7233, I had already looked at it few hours back and it solves the problem for concurrent queries but my problem is just fo

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-27 Thread Ted Yu
Can you try your query using Spark 1.4.0 RC2 ? There have been some fixes since 1.2.0 e.g. SPARK-7233 ClosureCleaner#clean blocks concurrent job submitter threads Cheers On Wed, May 27, 2015 at 10:38 AM, Nitin Goyal wrote: > Hi All, > > I am running a SQL query (spark version 1.2) on a table c

ClosureCleaner slowing down Spark SQL queries

2015-05-27 Thread Nitin Goyal
Hi All, I am running a SQL query (spark version 1.2) on a table created from unionAll of 3 schema RDDs which gets executed in roughly 400ms (200ms at driver and roughly 200ms at executors). If I run same query on a table created from unionAll of 27 schema RDDS, I see that executors time is same(b