What benefits do we really get out of colocation?

kant kodali Fri, 02 Dec 2016 22:39:25 -0800

I wonder what benefits do I really I get If I colocate my spark worker
process and Cassandra server process on each node?


I understand the concept of moving compute towards the data instead of
moving data towards computation but It sounds more like one is trying to
optimize for network latency.

Majority of my nodes (m4.xlarge)  have 1Gbps = 125MB/s (Megabytes per
second) Network throughput.

and the DISK throughput for m4.xlarge is 93.75 MB/s (link below)

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html

so In this case I don't see how colocation can help even if there is one to
one mapping from spark worker node to a colocated Cassandra node where say
we are doing a table scan of billion rows ?

Thanks!

What benefits do we really get out of colocation?

Reply via email to