Re: Coding style question (about extra anonymous closure within functional transformations)

2016-04-13 Thread Reynold Xin
We prefer the latter. I don't think there are performance differences though. It depends on how big the change is -- massive style updates can make backports harder. On Wed, Apr 13, 2016 at 7:46 PM, Hyukjin Kwon wrote: > Hi all, > > I recently noticed that actually there are some usages of fun

Coding style question (about extra anonymous closure within functional transformations)

2016-04-13 Thread Hyukjin Kwon
Hi all, I recently noticed that actually there are some usages of functional transformations (eg. map, foreach and etc.) with extra anonymous closure. For example, ...map(item => { ... }) which can be just simply as below: ...map { item => ... } I wrote a regex to find all of them and cor

Dataset.explain, ExplainCommand and sqlContext.executePlan twice?

2016-04-13 Thread Jacek Laskowski
Hi, While reviewing explain(extended: Boolean) - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L408 - I've noticed that: 1. It first creates ExplainCommand that does sqlContext.executePlan(logicalPlan) in run https://github.com/apache/spark

Re: Different maxBins value for categorical and continuous features in RandomForest implementation.

2016-04-13 Thread Rahul Tanwani
Added https://issues.apache.org/jira/browse/SPARK-14606 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Different-maxBins-value-for-categorical-and-continuous-features-in-RandomForest-implementation-tp17099p17123.html Sent from the Apache Spark Develop

Re: DynamoDB data source questions

2016-04-13 Thread Reynold Xin
Responses inline On Wed, Apr 13, 2016 at 7:45 AM, Travis Crawford wrote: > Hi Spark gurus, > > At Medium we're using Spark for an ETL job that scans DynamoDB tables and > loads into Redshift. Currently I use a parallel scanner implementation that > writes files to local disk, then have Spark rea

DynamoDB data source questions

2016-04-13 Thread Travis Crawford
Hi Spark gurus, At Medium we're using Spark for an ETL job that scans DynamoDB tables and loads into Redshift. Currently I use a parallel scanner implementation that writes files to local disk, then have Spark read them as a DataFrame. Ideally we could read the DynamoDB table directly as a DataFr

Should localProperties be inheritable? Should we change that or document it?

2016-04-13 Thread Marcin Tustin
*Tl;dr: *SparkContext.setLocalProperty is implemented with InheritableThreadLocal. This has unexpected consequences, not least because the method documentation doesn't say anything about it: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L605 I

Re: Spark on Mesos 0.28 issue

2016-04-13 Thread Yang Lei
I looked at the JIRA. I do not think it is related, as without using docker image for spark task, the framework still fail. After my work around, both scenarios, w/ docker and w/o worked. About logs, the only thing caught my eyes is the line I pasted. It is from the master mesos log. The slave

Re: Spark on Mesos 0.28 issue

2016-04-13 Thread Adrian Bridgett
I think you maybe hitting https://issues.apache.org/jira/browse/MESOS-4878 which was fixed in Mesos 0.28.1 On 13/04/2016 02:34, Timothy Chen wrote: Hi Yang, Can you share the master log/slave log? Tim On Apr 12, 2016, at 2:05 PM, Yang Lei > wrote: I have been a

Re: Code freeze?

2016-04-13 Thread Reynold Xin
I think the main things are API things that we need to get right. - Implement essential DDLs https://issues.apache.org/jira/browse/SPARK-14118 this blocks the next one - Merge HiveContext and SQLContext and create SparkSession https://issues.apache.org/jira/browse/SPARK-13485 - Separate out loc

Code freeze?

2016-04-13 Thread Sean Owen
I've heard several people refer to a code freeze for 2.0. Unless I missed it, nobody has discussed a particular date for this: https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage I'd like to start with a review of JIRAs before anyone decides a freeze is appropriate. There are hundreds