Re: A simple example that demonstrates that a Spark distributed cluster is faster than Spark Local Standalone

2020-09-25 Thread Sean Owen
Maybe the better approach is to understand why your job isnt scaling - what does the UI show? are the resources actually the same? for example do you have more than 8 cores in the local setup? Is there enough parallelism? for example it doesn't look like the small input is repartitioned to at least

Re: A simple example that demonstrates that a Spark distributed cluster is faster than Spark Local Standalone

2020-09-25 Thread javaguy Java
Thanks - that's great I'll check out both spark-bench and SparkPi. I do have more than 8 cores in the local setup. 24 cores in total (12 per machine). However on AWS with the same cluster setup, that is not the case; I chose Medium size instances hoping that a much smaller instance since would s

Re: A simple example that demonstrates that a Spark distributed cluster is faster than Spark Local Standalone

2020-09-25 Thread Sean Owen
I mean that local[*] = all cores on the machines, whereas in your example you seem to be choosing 8 cores per executor in the distributed case. You'd have 12 cores in your local case - which is still less than 2x8, but just the kind of thing to consider when comparing these setups. Indeed, how wel