Sorry, wasn’t very clear (looks like Pavan’s response was dropped from list for
some reason as well).
I am assuming that:
1) the first map is CPU bound
2) the second map is heavily memory bound
To be specific, lets saw you are using 4 m3.2xlarge instances which have 8 CPUs
and 30GB of ram each
Hi,
How would that help?! Why would you do that?
Jacek
On 17 Jul 2016 7:19 a.m., "Pedro Rodriguez" wrote:
> You could call map on an RDD which has “many” partitions, then call
> repartition/coalesce to drastically reduce the number of partitions so that
> your second map job has less things ru
You could call map on an RDD which has “many” partitions, then call
repartition/coalesce to drastically reduce the number of partitions so that
your second map job has less things running.
—
Pedro Rodriguez
PhD Student in Large-Scale Machine Learning | CU Boulder
Systems Oriented Data Scientist
Hi,
My understanding is that these two map functions will end up as a job
with one stage (as if you wrote the two maps as a single map) so you
really need as much vcores and memory as possible for map1 and map2. I
initially thought about dynamic allocation of executors that may or
may not help you