Dear All,

I'm running Spark Streaming (1.0.0) with Yarn (2.2.0) on a 10-node cluster.
I setup 10 custom receivers to hear from 10 data streams. I want one
receiver per node in order to maximize the network bandwidth. However, if I
set "--executor-cores 4", the 10 receivers only run on 3 of the nodes in
the cluster, each running 4, 4, 2 receivers; if I set "--executor-cores 1",
each node will run exactly one receiver, and it seems that Spark can't make
any progress to process theses streams.

I read the documentation on configuration and also googled but didn't find
a clue. Is there a way to configure how the receivers are distributed?

Thanks!

Here are some details:
================================
How I created 10 receivers:

    val conf = new SparkConf().setAppName(jobId)
    val sc = new StreamingContext(conf, Seconds(1))
    var lines:DStream[String] =
      sc.receiverStream(
          new CustomReceiver(...)
          )
    for(i <- 1 to 9) {
    lines = lines.union(
        sc.receiverStream(
          new CustomReceiver(...)
       )
    }

How I submit a job to Yarn:

spark-submit \
    --class $JOB_CLASS \
    --master yarn-client \
    --num-executors 10 \
    --driver-memory 1g \
    --executor-memory 2g \
    --executor-cores 4 \
    $JAR_NAME

Reply via email to