date:20160412

jdbc/save DataFrameWriter implementation change

2016-04-12 Thread Justin.Pihony

Hi, I have a ticket open on how save should delegate to the jdbc method, however I went to implement this and it just didn't seem clean. Please take a look at my comment on https://issues.apache.org/jira/browse/SPARK-14525 and let me know if you agree with the second approach or not. Thanks, Just

Accessing Secure Hadoop from Mesos cluster

2016-04-12 Thread Tony Kinsley

I have been working towards getting some spark streaming jobs to run in Mesos cluster mode (using docker containers) and write data periodically to a secure HDFS cluster. Unfortunately this does not seem to be well supported currently in spark ( https://issues.apache.org/jira/browse/SPARK-12909). T

Re: Different maxBins value for categorical and continuous features in RandomForest implementation.

2016-04-12 Thread Joseph Bradley

That sounds useful. Would you mind creating a JIRA for it? Thanks! Joseph On Mon, Apr 11, 2016 at 2:06 AM, Rahul Tanwani wrote: > Hi, > > Currently the RandomForest algo takes a single maxBins value to decide the > number of splits to take. This sometimes causes training time to go very > high

Re: Spark on Mesos 0.28 issue

2016-04-12 Thread Timothy Chen

Hi Yang, Can you share the master log/slave log? Tim > On Apr 12, 2016, at 2:05 PM, Yang Lei wrote: > > I have been able to run spark submission in docker container (HOST network) > through Marathon on mesos and target to Mesos cluster (zk address) for at > least Spark 1.6, 1.5.2 over Mesos

Re: Spark 1.6.1 packages on S3 corrupt?

2016-04-12 Thread Nicholas Chammas

Yes, this is a known issue. The core devs are already aware of it. [CC dev] FWIW, I believe the Spark 1.6.1 / Hadoop 2.6 package on S3 is not corrupt. It may be the only 1.6.1 package that is not corrupt, though. :/ Nick On Tue, Apr 12, 2016 at 9:00 PM Augustus Hong wrote: > Hi all, > > I'm t

Spark on Mesos 0.28 issue

2016-04-12 Thread Yang Lei

I have been able to run spark submission in docker container (HOST network) through Marathon on mesos and target to Mesos cluster (zk address) for at least Spark 1.6, 1.5.2 over Mesos 0.26, 0.27. I do need to define SPARK_PUBLIC_DNS and SPARK_LOCAL_IP so that the spark driver can announce the

Re: SparkSQL - Limit pushdown on BroadcastHashJoin

2016-04-12 Thread Herman van Hövell tot Westerflier

I am not sure if you can push a limit through a join. This becomes problematic if not all keys are present on both sides; in such a case a limit can produce fewer rows than the set limit. This might be a rare case in which whole stage codegen is slower, due to the fact that we need to buffer the r

SparkSQL - Limit pushdown on BroadcastHashJoin

2016-04-12 Thread Rajesh Balamohan

Hi, I ran the following query in spark (latest master codebase) and it took a lot of time to complete even though it was a broadcast hash join. It appears that limit computation is done only after computing complete join condition. Shouldn't the limit condition be pushed to BroadcastHashJoin (wh

Possible deadlock in registering applications in the recovery mode

2016-04-12 Thread Niranda Perera

Hi all, I have encountered a small issue in the standalone recovery mode. Let's say there was an application A running in the cluster. Due to some issue, the entire cluster, together with the application A goes down. Then later on, cluster comes back online, and the master then goes into the 're

jdbc/save DataFrameWriter implementation change

Accessing Secure Hadoop from Mesos cluster

Re: Different maxBins value for categorical and continuous features in RandomForest implementation.

Re: Spark on Mesos 0.28 issue

Re: Spark 1.6.1 packages on S3 corrupt?

Spark on Mesos 0.28 issue

Re: SparkSQL - Limit pushdown on BroadcastHashJoin

SparkSQL - Limit pushdown on BroadcastHashJoin

Possible deadlock in registering applications in the recovery mode

9 matches

Site Navigation

Mail list logo

Footer information