Hi,
I have a ticket open on how save should delegate to the jdbc method, however
I went to implement this and it just didn't seem clean. Please take a look
at my comment on https://issues.apache.org/jira/browse/SPARK-14525 and let
me know if you agree with the second approach or not.
Thanks,
Just
I have been working towards getting some spark streaming jobs to run in
Mesos cluster mode (using docker containers) and write data periodically to
a secure HDFS cluster. Unfortunately this does not seem to be well
supported currently in spark (
https://issues.apache.org/jira/browse/SPARK-12909). T
That sounds useful. Would you mind creating a JIRA for it? Thanks!
Joseph
On Mon, Apr 11, 2016 at 2:06 AM, Rahul Tanwani
wrote:
> Hi,
>
> Currently the RandomForest algo takes a single maxBins value to decide the
> number of splits to take. This sometimes causes training time to go very
> high
Hi Yang,
Can you share the master log/slave log?
Tim
> On Apr 12, 2016, at 2:05 PM, Yang Lei wrote:
>
> I have been able to run spark submission in docker container (HOST network)
> through Marathon on mesos and target to Mesos cluster (zk address) for at
> least Spark 1.6, 1.5.2 over Mesos
Yes, this is a known issue. The core devs are already aware of it. [CC dev]
FWIW, I believe the Spark 1.6.1 / Hadoop 2.6 package on S3 is not corrupt.
It may be the only 1.6.1 package that is not corrupt, though. :/
Nick
On Tue, Apr 12, 2016 at 9:00 PM Augustus Hong
wrote:
> Hi all,
>
> I'm t
I have been able to run spark submission in docker container (HOST network)
through Marathon on mesos and target to Mesos cluster (zk address) for at least
Spark 1.6, 1.5.2 over Mesos 0.26, 0.27.
I do need to define SPARK_PUBLIC_DNS and SPARK_LOCAL_IP so that the spark
driver can announce the
I am not sure if you can push a limit through a join. This becomes
problematic if not all keys are present on both sides; in such a case a
limit can produce fewer rows than the set limit.
This might be a rare case in which whole stage codegen is slower, due to
the fact that we need to buffer the r
Hi,
I ran the following query in spark (latest master codebase) and it took a
lot of time to complete even though it was a broadcast hash join.
It appears that limit computation is done only after computing complete
join condition. Shouldn't the limit condition be pushed to
BroadcastHashJoin (wh
Hi all,
I have encountered a small issue in the standalone recovery mode.
Let's say there was an application A running in the cluster. Due to some
issue, the entire cluster, together with the application A goes down.
Then later on, cluster comes back online, and the master then goes into the
're