Hello,
My Spark application is written in Scala and submitted to a Spark cluster
in standalone mode. The Spark Jobs for my application are listed in the
Spark UI like this:
Job Id Description ...
6 saveAsTextFile at Foo.scala:202
5 saveAsTextFile at Foo.scala:201
4
Hello,
I am trying to use the default Spark cluster manager in a production
environment. I will be submitting jobs with spark-submit. I wonder if the
following is possible:
1. Get the Driver ID from spark-submit. We will use this ID to keep track
of the job and kill it if necessary.
2. Weather i
15:38 +
>
>
> Hi Rares,
>
> The number of partition is controlled by HDFS input format, and one file
> may have multiple partitions if it consists of multiple block. In you case,
> I think there is one file with 2 splits.
>
> Thanks.
>
> Zhan Zhang
>
Hello,
I am using the Spark shell in Scala on the localhost. I am using sc.textFile
to read a directory. The directory looks like this (generated by another
Spark script):
part-0
part-1
_SUCCESS
The part-0 has four short lines of text while part-1 has two short
lines of text. Th
Hi,
I have a private cluster with private IPs, 192.168.*.*, and a gateway node
with both private IP, 192.168.*.*, and public internet IP.
I setup the Spark master on the gateway node and set the SPARK_MASTER_IP to
the private IP. I start Spark workers on the private nodes. It works fine.
The pro
Hello,
I am using takeSample from the Scala Spark 1.2.1 shell:
scala> sc.textFile("README.md").takeSample(false, 3)
and I notice that two jobs are generated on the Spark Jobs page:
Job Id Description
1 takeSample at :13
0 takeSample at :13
Any ideas why the two jobs are needed?
Thanks!
Rar