[SparkSQL ] What is Exchange in physical plan for ?

2015-06-08 Thread invkrh
Hi, DataFrame.explain() shows the physical plan of a query. I noticed there are a lot of `Exchange`s in it, like below: Project [period#20L,categoryName#0,regionName#10,action#15,list_id#16L] ShuffledHashJoin [region#18], [regionCode#9], BuildRight Exchange (HashPartitioning [region#18], 12)

RE: [SparkSQL ] What is Exchange in physical plan for ?

2015-06-08 Thread Cheng, Hao
It means the data shuffling, and its arguments also show the partitioning strategy. -Original Message- From: invkrh [mailto:inv...@gmail.com] Sent: Monday, June 8, 2015 9:34 PM To: dev@spark.apache.org Subject: [SparkSQL ] What is Exchange in physical plan for ? Hi, DataFrame.explain()

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-08 Thread Patrick Wendell
Hi All, Thanks for the continued voting! I'm going to leave this thread open for another few days to continue to collect feedback. - Patrick On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.4.0! > > The tag to

[ml] Why all model classes are final?

2015-06-08 Thread Peter Rudenko
Hi, previously all the models in ml package were private to package, so if i need to customize some models i inherit them in org.apache.spark.ml package in my project. But now new models (https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/GBTClass

[sample code] deeplearning4j for Spark ML (@DeveloperAPI)

2015-06-08 Thread Eron Wright
The deeplearning4j framework provides a variety of distributed, neural network-based learning algorithms, including convolutional nets, deep auto-encoders, deep-belief nets, and recurrent nets. We’re working on integration with the Spark ML pipeline, leveraging the developer API. This a

Re: Stages with non-arithmetic numbering & Timing metrics in event logs

2015-06-08 Thread Imran Rashid
Hi Mike, all good questions, let me take a stab at answering them: 1. Event Logs + Stages: Its normal for stages to get skipped if they are shuffle map stages, which get read multiple times. Eg., here's a little example program I wrote earlier to demonstrate this: "d3" doesn't need to be re-shu

SparkR Reading Tables from Hive

2015-06-08 Thread Eskilson,Aleksander
Hi there, I’m testing out the new SparkR-Hive interop right now. I’m noticing an apparent disconnect between the Hive store I have my data loaded and the store that sparkRHIve.init() connects to. For example, in beeline: 0: jdbc:hive2://quickstart.cloudera:1> show databases; +--

Re: SparkR Reading Tables from Hive

2015-06-08 Thread Eskilson,Aleksander
Resolved, my hive-site.xml wasn’t in the conf folder. I can load tables into DataFrames as expected. Thanks, Alek From: , Aleksander Eskilson mailto:alek.eskil...@cerner.com>> Date: Monday, June 8, 2015 at 3:38 PM To: "dev@spark.apache.org" mailto:dev@spark.apache.

Re: SparkR Reading Tables from Hive

2015-06-08 Thread Shivaram Venkataraman
Thanks for the confirmation - I was just going to send a pointer to the documentation that talks about hive-site.xml. http://people.apache.org/~pwendell/spark-releases/latest/sql-programming-guide.html#hive-tables Thanks Shivaram On Mon, Jun 8, 2015 at 1:57 PM, Eskilson,Aleksander < alek.eskil...

RE: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-08 Thread Wang, Daoyuan
+1 -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Wednesday, June 03, 2015 1:47 PM To: dev@spark.apache.org Subject: Re: [VOTE] Release Apache Spark 1.4.0 (RC4) He all - a tiny nit from the last e-mail. The tag is v1.4.0-rc4. The exact commit and all other in

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-08 Thread Denny Lee
+1 On Mon, Jun 8, 2015 at 17:51 Wang, Daoyuan wrote: > +1 > > -Original Message- > From: Patrick Wendell [mailto:pwend...@gmail.com] > Sent: Wednesday, June 03, 2015 1:47 PM > To: dev@spark.apache.org > Subject: Re: [VOTE] Release Apache Spark 1.4.0 (RC4) > > He all - a tiny nit from the

Fwd: pull requests no longer closing by commit messages with "closes #xxxx"

2015-06-08 Thread Reynold Xin
FYI. -- Forwarded message -- From: John Greet (GitHub Staff) Date: Mon, Jun 8, 2015 at 5:50 PM Subject: Re: pull requests no longer closing by commit messages with "closes #" To: Reynold Xin Hi Reynold, The problem here is that the commits closing those pull requests were

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-08 Thread saurfang
+1 Build for Hadoop 2.4. Run a few jobs on YARN and tested spark.sql.unsafe whose performance seems great! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-4-0-RC4-tp12582p12671.html Sent from the Apache Spark Developers Lis