You get more latency on reads so overall execution time is longer

Le 3 déc. 2016 7:39 AM, "kant kodali" <kanth...@gmail.com> a écrit :

>
> I wonder what benefits do I really I get If I colocate my spark worker
> process and Cassandra server process on each node?
>
> I understand the concept of moving compute towards the data instead of
> moving data towards computation but It sounds more like one is trying to
> optimize for network latency.
>
> Majority of my nodes (m4.xlarge)  have 1Gbps = 125MB/s (Megabytes per
> second) Network throughput.
>
> and the DISK throughput for m4.xlarge is 93.75 MB/s (link below)
>
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html
>
> so In this case I don't see how colocation can help even if there is one
> to one mapping from spark worker node to a colocated Cassandra node where
> say we are doing a table scan of billion rows ?
>
> Thanks!
>
>

Reply via email to