Don't be too concerned about the Scala hoop. Before making the
commitment to Scala, I had coded up a modest analytic prototype in
Hadoop mapreduce. Once making the commitment, it took 10 days to
(1) learn enough Scala, and (2) re-write the prototype in Spark in
Scala.
Sandy and others:
Is there a single source of Yarn/Hadoop properties that should be
set or reset for running Spark on Yarn?
We've sort of stumbled through one property after another, and
(unless there's an update I've not yet seen) CDH5 Spark-related
properties a
e are not doing
anything unusual.
Did you do any custom configuration? Any advice would be
appreciated.
-Suren
On Tue, Jul 8, 2014 at 1:54 PM, Kevin
Ma
n Tue, Jul 8, 2014 at 12:06 PM, Kevin
Markey <kevin.mar...@oracle.com>
wrote:
When you say "large
data sets", how large?
Thanks
When you say "large data sets", how large?
Thanks
On 07/07/2014 01:39 PM, Daniel Siegmann
wrote:
From a development perspective, I vastly prefer Spark to
MapReduce. The MapReduce API is very constrained; Spark's
y its
location by exporting its location as SPARK_JAR.
Kevin Markey
On 06/19/2014 11:22 AM, Koert Kuipers
wrote:
i am trying to understand how yarn-client
mode works. i am not using spark-submit, but instead
launching a spar
Tom
On
Wednesday, May 21, 2014 6:10 PM, Kevin
y web page, updating
my scripts and configuration appropriately, and running except for
these two anomalies.
Thanks
Kevin Markey
We are now testing precisely what you ask about in our environment.
But Sandy's questions are relevant. The bigger issue is not Spark
vs. Yarn but "client" vs. "standalone" and where the client is
located on the network relative to the cluster.
The "client" options
job. But what if -- as occurs in another application --
there's only one or two stages, but lots of data passing through
those 1 or 2 stages?
Kevin Markey
On 04/01/2014 09:55 AM, Mark Hamstra
wrote:
Some related discussion: https://githu
10 matches
Mail list logo