I could actually figure out, that it had to do with the Mesos Run Mode of Spark.
Setting spark.mesos.coarse to true made all the difference.

So the primary performance bummer was actually the fine-grained mode and 
therefore Mesos overhead.

Thanks!
Sebastian

2015-11-03 20:07 GMT+01:00 Sebastian Kuepers
<[email protected]>:
> Hey,
>
> with collect() RDDs elements are send as a list back to the driver.
>
> If have a 4 node cluster (based on Mesos) in a datacenter and I have my
> local dev machine.
>
> I work with a small 200MB dataset just for testing during development right
> now.
>
> The collect() tasks are running for times faster on my local machine, than
> on the cluster, although it actually uses 4x the number of cores etc.
>
> It's 7 seconds locally and 28 seconds on the cluster for the same collect()
> job.
>
> What's the reason for that? Is that just network latency sending back the
> data to the driver within the cluster? (well it's just this 200MB in total)
>
> Is that somehow a kind of 'management overhead' form Mesos?
>
> Appreciate any thoughts an possible impacts for that!
Serialization and sending over network takes time, way more than
simply processing the data on the same machine. But it doesn't scale
as well. Try with more data and plot the results.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




------------------------------------------------------------------------
Disclaimer The information in this email and any attachments may contain 
proprietary and confidential information that is intended for the addressee(s) 
only. If you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution, retention or use of the contents of this 
information is prohibited. When addressed to our clients or vendors, any 
information contained in this e-mail or any attachments is subject to the terms 
and conditions in any governing contract. If you have received this e-mail in 
error, please immediately contact the sender and delete the e-mail.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to