First, Spark worker not have the ability to compute.In fact,executor is
responsible for computation.
Executor running tasks is distributed by driver.
Each Task just read some section of data in normal, but the stage have only
one partition.
IF your operators not contains the operator that will pull
I am running a model where the workers should not have the data stored in
them. They are only for execution purpose. The other cluster (its just a
single node) which I am receiving data from is just acting as a file server,
for which I could have used any other way like nfs or ftp. So I went with
h
You said your hdfs cluster and spark cluster is running on different
cluster.This is not a good idea,because you should consider data
locality.Your spark node need config hdfs client configuration.
Spark Job is composed of stages,each stage have one or more
partitions。Parallelism of job decided by
Hi,
I am new to spark. I am running a hdfs file system on a remote cluster
whereas my spark workers are on another cluster. When my textFile RDD gets
executed, does spark worker read from the file according to hdfs partitions
task by task, or do they read it once when the blockmanager sets after