Big performance difference between "client" and "cluster" deployment mode; is this expected?

Enno Shioji Wed, 31 Dec 2014 08:14:52 -0800

Hi,

I have a very, very simple streaming job. When I deploy this on the exact
same cluster, with the exact same parameters, I see big (40%) performance
difference between "client" and "cluster" deployment mode. This seems a bit
surprising.. Is this expected?


The streaming job is:

    val msgStream = kafkaStream
      .map { case (k, v) => v}
      .map(DatatypeConverter.printBase64Binary)
      .foreachRDD(save)
      .saveAsTextFile("s3n://some.bucket/path", classOf[LzoCodec])

I tried several times, but the job deployed with "client" mode can only
write at 60% throughput of the job deployed with "cluster" mode and this
happens consistently. I'm logging at INFO level, but my application code
doesn't log anything so it's only Spark logs. The logs I see in "client"
mode doesn't seem like a crazy amount.

The setup is:
spark-ec2 [...] \
  --copy-aws-credentials \
  --instance-type=m3.2xlarge \
  -s 2 launch test_cluster

And all the deployment was done from the master machine.

ᐧ

Big performance difference between "client" and "cluster" deployment mode; is this expected?

Reply via email to