Re: GraphX: New graph operator

2015-06-01 Thread Ankur Dave
I think it would be good to have more basic operators like union or difference, as long as they have an efficient distributed implementation and are plausibly useful. If they can be written in terms of the existing GraphX API, it would be best to put them into GraphOps to keep the core GraphX impl

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Bobby Chowdary
Hi Patrick, Thanks for clarifying. No issues with functionality. +1 (non-binding) Thanks Bobby On Mon, Jun 1, 2015 at 9:41 PM, Patrick Wendell wrote: > Hey Bobby, > > Those are generic warnings that the hadoop libraries throw. If you are > using MapRFS they shouldn't matter si

Re: [SQL] Write parquet files under partition directories?

2015-06-01 Thread Reynold Xin
There will be in 1.4. df.write.partitionBy("year", "month", "day").parquet("/path/to/output") On Mon, Jun 1, 2015 at 10:21 PM, Matt Cheah wrote: > Hi there, > > I noticed in the latest Spark SQL programming guide > , there > is su

[SQL] Write parquet files under partition directories?

2015-06-01 Thread Matt Cheah
Hi there, I noticed in the latest Spark SQL programming guide , there is support for optimized reading of partitioned Parquet files that have a particular directory structure (year=1/month=10/day=3, for example). However, I see no a

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Patrick Wendell
Hey Bobby, Those are generic warnings that the hadoop libraries throw. If you are using MapRFS they shouldn't matter since you are using the MapR client and not the default hadoop client. Do you have any issues with functionality... or was it just seeing the warnings that was the concern? Thanks

Re: [Streaming] Configure executor logging on Mesos

2015-06-01 Thread Gerard Maas
Hi Tim, (added dev, removed user) I've created https://issues.apache.org/jira/browse/SPARK-8009 to track this. -kr, Gerard. On Sat, May 30, 2015 at 7:10 PM, Tim Chen wrote: > So sounds like some generic downloadable uris support can solve this > problem, that Mesos automatically places in your

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Bobby Chowdary
Hive Context works on RC3 for Mapr after adding spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819 . However, there still seems to be some other issues with native libraries, i get below warning WARN NativeCodeLoader: Unable to load

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Sean Owen
I get a bunch of failures in VersionSuite with build/test params "-Pyarn -Phive -Phadoop-2.6": - success sanity check *** FAILED *** java.lang.RuntimeException: [download failed: org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed: commons-net#commons-net;3.1!commons-net.jar]

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Michael Armbrust
Its no longer valid to start more than one instance of HiveContext in a single JVM, as one of the goals of this refactoring was to allow connection to more than one metastore from a single context. For tests I suggest you use TestHive as we do in our unit tests. It has a reset() method you can us

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Peter Rudenko
Thanks Yin, tried on a clean VM - works now. But tests in my app still fails: |[info] Cause: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool

Re: spark 1.4 - test-loading 1786 mysql tables / a few TB

2015-06-01 Thread Reynold Xin
Thanks, René. I actually added a warning to the new JDBC reader/writer interface for 1.4.0. Even with that, I think we should support throttling JDBC; otherwise it's too convenient for our users to DOS their production database servers! /** * Construct a [[DataFrame]] representing the datab

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Andrew Or
+1 (binding) Tested the standalone cluster mode REST submission gateway - submit / status / kill Tested simple applications on YARN client / cluster modes with and without --jars Tested python applications on YARN client / cluster modes with and without --py-files* Tested dynamic allocation on YAR

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Yin Huai
Hi Peter, Based on your error message, seems you were not using the RC3. For the error thrown at HiveContext's line 206, we have changed the message to this one just befor

Re: please use SparkFunSuite instead of ScalaTest's FunSuite from now on

2015-06-01 Thread Andrew Or
It will be within the next few days 2015-06-01 9:17 GMT-07:00 Reynold Xin : > I don't think so. > > > On Monday, June 1, 2015, Steve Loughran wrote: > >> Is this backported to branch 1.3? >> >> On 31 May 2015, at 00:44, Reynold Xin wrote: >> >> FYI we merged a patch that improves unit test l

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Peter Rudenko
Still have problem using HiveContext from sbt. Here’s an example of dependencies: |val sparkVersion = "1.4.0-rc3" lazy val root = Project(id = "spark-hive", base = file("."), settings = Project.defaultSettings ++ Seq( name := "spark-1.4-hive", scalaVersion := "2.10.5", scalaBinaryVersion := "

Re: please use SparkFunSuite instead of ScalaTest's FunSuite from now on

2015-06-01 Thread Reynold Xin
I don't think so. On Monday, June 1, 2015, Steve Loughran wrote: > Is this backported to branch 1.3? > > On 31 May 2015, at 00:44, Reynold Xin > wrote: > > FYI we merged a patch that improves unit test log debugging. In order > for that to work, all test suites have been changed to extend Sp

GraphX: New graph operator

2015-06-01 Thread Tarek Auel
Hello, Someone proposed in a Jira issue to implement new graph operations. Sean Owen recommended to check first with the mailing list, if this is interesting or not. So I would like to know, if it is interesting for GraphX to implement the operators like: http://en.wikipedia.org/wiki/Graph_operat

Re: please use SparkFunSuite instead of ScalaTest's FunSuite from now on

2015-06-01 Thread Steve Loughran
Is this backported to branch 1.3? On 31 May 2015, at 00:44, Reynold Xin mailto:r...@databricks.com>> wrote: FYI we merged a patch that improves unit test log debugging. In order for that to work, all test suites have been changed to extend SparkFunSuite instead of ScalaTest's FunSuite. We also

Re: spark 1.4 - test-loading 1786 mysql tables / a few TB

2015-06-01 Thread René Treffer
Hi, I'm using sqlContext.jdbc(uri, table, where).map(_ => 1).aggregate(0)(_+_,_+_) on an interactive shell (where "where" is an Array[String] of 32 to 48 elements). (The code is tailored to your db, specifically through the where conditions, I'd have otherwise post it) That should be the DataFram

Re: spark 1.4 - test-loading 1786 mysql tables / a few TB

2015-06-01 Thread Reynold Xin
Never mind my comment about 3. You were talking about the read side, while I was thinking about the write side. Your workaround actually is a pretty good idea. Can you create a JIRA for that as well? On Monday, June 1, 2015, Reynold Xin wrote: > René, > > Thanks for sharing your experience. Are

Re: spark 1.4 - test-loading 1786 mysql tables / a few TB

2015-06-01 Thread Reynold Xin
René, Thanks for sharing your experience. Are you using the DataFrame API or SQL? (1) Any recommendations on what we do w.r.t. out of range values? Should we silently turn them into a null? Maybe based on an option? (2) Looks like a good idea to always quote column names. The small tricky thing

spark 1.4 - test-loading 1786 mysql tables / a few TB

2015-06-01 Thread René Treffer
Hi *, I used to run into a few problems with the jdbc/mysql integration and thought it would be nice to load our whole db, doing nothing but .map(_ => 1).aggregate(0)(_+_,_+_) on the DataFrames. SparkSQL has to load all columns and process them so this should reveal type errors like SPARK-7897 Col