Re: PySpark on PyPi

2015-06-05 Thread Olivier Girardot
Ok, I get it. Now what can we do to improve the current situation, because right now if I want to set-up a CI env for PySpark, I have to : 1- download a pre-built version of pyspark and unzip it somewhere on every agent 2- define the SPARK_HOME env 3- symlink this distribution pyspark dir inside th

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Bobby Chowdary
Thanks Yin ! every thing else works great! +1 (non-binding) On Fri, Jun 5, 2015 at 2:11 PM, Yin Huai wrote: > Hi Bobby, > > sqlContext.table("test.test1") is not officially supported in 1.3. For > now, please use the "use database" as a workaround. We will add it. > > Thanks, > > Yin > > On Fr

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Yin Huai
Hi Bobby, sqlContext.table("test.test1") is not officially supported in 1.3. For now, please use the "use database" as a workaround. We will add it. Thanks, Yin On Fri, Jun 5, 2015 at 12:18 PM, Bobby Chowdary wrote: > Not sure if its a blocker but there might be a minor issue with hive > cont

Re: PySpark on PyPi

2015-06-05 Thread Jey Kottalam
Couldn't we have a pip installable "pyspark" package that just serves as a shim to an existing Spark installation? Or it could even download the latest Spark binary if SPARK_HOME isn't set during installation. Right now, Spark doesn't play very well with the usual Python ecosystem. For example, why

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Marcelo Vanzin
+1 (non-binding) Ran some of our internal test suite (yarn + standalone) against the hadoop-2.6 and without-hadoop binaries. On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.4.0! > > The tag to be voted on is v

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Bobby Chowdary
Not sure if its a blocker but there might be a minor issue with hive context, there is also a work around *Works:* from pyspark.sql import HiveContext sqlContext = HiveContext(sc) df = sqlContext.sql("select * from test.test1") *Does not Work:* df = sqlContext.table("test.test1") Py4JJavaErr

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Kousuke Saruta
+1 Built on Mac OS X with -Dhadoop.version=2.4.0 -Pyarn -Phive -Phive-thriftserver. Tested on YARN (cluster/client) on CentOS 7. Also WebUI incuding DAG and Timeline View work. On 2015/06/05 15:01, Burak Yavuz wrote: +1 Tested on Mac OS X Burak On Thu, Jun 4, 2015 at 6:35 PM, Calvin Jia

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Sandy Ryza
+1 (non-binding) Built from source and ran some jobs against a pseudo-distributed YARN cluster. -Sandy On Fri, Jun 5, 2015 at 11:05 AM, Ram Sriharsha wrote: > +1 , tested with hadoop 2.6/ yarn on centos 6.5 after building w/ -Pyarn > -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftse

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Ram Sriharsha
+1 , tested with hadoop 2.6/ yarn on centos 6.5 after building w/ -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver and ran a few SQL tests and the ML examples On Fri, Jun 5, 2015 at 10:55 AM, Hari Shreedharan wrote: > +1. Build looks good, ran a couple apps on YARN > > > T

Re: PySpark on PyPi

2015-06-05 Thread Josh Rosen
This has been proposed before: https://issues.apache.org/jira/browse/SPARK-1267 There's currently tighter coupling between the Python and Java halves of PySpark than just requiring SPARK_HOME to be set; if we did this, I bet we'd run into tons of issues when users try to run a newer version of the

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Hari Shreedharan
+1. Build looks good, ran a couple apps on YARN Thanks, Hari On Fri, Jun 5, 2015 at 10:52 AM, Yin Huai wrote: > Sean, > > Can you add "-Phive -Phive-thriftserver" and try those Hive tests? > > Thanks, > > Yin > > On Fri, Jun 5, 2015 at 5:19 AM, Sean Owen wrote: > >> Everything checks out agai

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Yin Huai
Sean, Can you add "-Phive -Phive-thriftserver" and try those Hive tests? Thanks, Yin On Fri, Jun 5, 2015 at 5:19 AM, Sean Owen wrote: > Everything checks out again, and the tests pass for me on Ubuntu + > Java 7 with '-Pyarn -Phadoop-2.6', except that I always get > SparkSubmitSuite errors li

Scheduler question: stages with non-arithmetic numbering

2015-06-05 Thread Mike Hynes
Hi folks, When I look at the output logs for an iterative Spark program, I see that the stage IDs are not arithmetically numbered---that is, there are gaps between stages and I might find log information about Stage 0, 1,2, 5, but not 3 or 4. As an example, the output from the Spark logs below sh

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Marcelo Vanzin
On Fri, Jun 5, 2015 at 5:19 AM, Sean Owen wrote: > - success sanity check *** FAILED *** > java.lang.RuntimeException: [download failed: > org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed: > commons-net#commons-net;3.1!commons-net.jar] > at > org.apache.spark.deploy.SparkS

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Sean Owen
Everything checks out again, and the tests pass for me on Ubuntu + Java 7 with '-Pyarn -Phadoop-2.6', except that I always get SparkSubmitSuite errors like ... - success sanity check *** FAILED *** java.lang.RuntimeException: [download failed: org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle),

Re: Regarding "Connecting spark to Mesos" documentation

2015-06-05 Thread François Garillot
The make-distribution script will indeed take maven options. If you want to add this to the documentation, one possibility is that you could supplement the information in that file : https://github.com/apache/spark/blob/master/docs/running-on-mesos.md with a pull-request. You'll also find contri

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Reynold Xin
Enjoy your new shiny mbp. On Fri, Jun 5, 2015 at 12:10 AM, Krishna Sankar wrote: > +1 (non-binding, of course) > > 1. Compiled OSX 10.10 (Yosemite) OK Total time: 25:42 min (My brand new > shiny MacBookPro12,1 : 16GB. Inaugurated the machine with compile & test > 1.4.0-RC4 !) > mvn clean pa

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 25:42 min (My brand new shiny MacBookPro12,1 : 16GB. Inaugurated the machine with compile & test 1.4.0-RC4 !) mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested pys