StreamingContextSuite fails with NoSuchMethodError

2015-05-29 Thread Ted Yu
Hi, I ran the following command on 1.4.0 RC3: mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package I saw the following failure: ^[[32mStreamingContextSuite:^[[0m ^[[32m- from no conf constructor^[[0m ^[[32m- from no conf + spark home^[[0m ^[[32m- from no conf + spark home + env^[[0m ^[[

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Taka Shinagawa
Mike, The broken Configuration link can be fixed if you add a missing dash '-' on the first line in docs/configuration.md and run 'jekyll build'. https://github.com/apache/spark/pull/6513 On Fri, May 29, 2015 at 6:38 PM, Mike Ringenburg wrote: > The Configuration link on the docs appears to b

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Mike Ringenburg
The Configuration link on the docs appears to be broken. Mike On May 29, 2015, at 4:41 PM, Patrick Wendell mailto:pwend...@gmail.com>> wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip

[RESULT] [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-29 Thread Patrick Wendell
Thanks for all the discussion on the vote thread. I am canceling this vote in favor of RC3. On Sun, May 24, 2015 at 12:22 AM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.4.0! > > The tag to be voted on is v1.4.0-rc2 (commit 03fb26a3): > h

[VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can

Saving DataFrame in Tachyon

2015-05-29 Thread sara mustafa
Hi All, I have Spark-1.3.0 and Tachyon-0.5.0. When I am trying to save RDD in tachyon, it success. But for saving a DataFrame it fails with the following error: java.lang.IllegalArgumentException: Wrong FS: tachyon://localhost:19998/myres, expected: hdfs://localhost:54310 at org.apache.had

Using UDFs in Java without registration

2015-05-29 Thread Justin Uang
I would like to define a UDF in Java via a closure and then use it without registration. In Scala, I believe there are two ways to do this: myUdf = functions.udf({ _ + 5}) myDf.select(myUdf(myDf("age"))) or myDf.select(functions.callUDF({_ + 5}, DataTypes.IntegerType, myDf("age")))

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-29 Thread Yin Huai
For Spark SQL internal operations, probably we can just create MapPartitionsRDD directly (like https://github.com/apache/spark/commit/5287eec5a6948c0c6e0baaebf35f512324c0679a ). On Fri, May 29, 2015 at 11:04 AM, Josh Rosen wrote: > Hey, want to file a JIRA for this? This will make it easier to

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-29 Thread Josh Rosen
Hey, want to file a JIRA for this? This will make it easier to track progress on this issue. Definitely upload the profiler screenshots there, too, since that's helpful information. https://issues.apache.org/jira/browse/SPARK On Wed, May 27, 2015 at 11:12 AM, Nitin Goyal wrote: > Hi Ted, >

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-29 Thread Peter Rudenko
Hi Yin, i’m using spark-hive dependency and tests for my app work for spark1.3.1. seems it’s something with hive & sbt. Running from spark-shell next statement works, but from sbt console in rc3 i get next error: scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) 15/05/29 16

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-29 Thread Olivier Girardot
Actually, the Scala API too is only based on column name Le ven. 29 mai 2015 à 11:23, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > Hi, > Testing a bit more 1.4, it seems that the .drop() method in PySpark > doesn't seem to accept a Column as input datatype : > > > *.join(on

Dataframe's .drop in PySpark doesn't accept Column

2015-05-29 Thread Olivier Girardot
Hi, Testing a bit more 1.4, it seems that the .drop() method in PySpark doesn't seem to accept a Column as input datatype : *.join(only_the_best, only_the_best.pol_no == df.pol_no, "inner").drop(only_the_best.pol_no)\* File "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", li