Yes, throughput for a given partition key cannot be improved with
horizontal scaling.  You can increase RF to theoretically improve
throughput on that key, but actually in this case smart clients might hold
you back, because they're probably token aware, and will try to serve that
read off the key's primary replica, so all reads would be directed at a
single node for that key.

If you're reading at CL=QUORUM, there's a chance that increasing RF will
actually reduce performance rather than improve it, because you've
increased the total amount of work to serve the read (as well as the
write).  If you're reading at CL=ONE, increasing RF will increase the
chances of falling afoul of eventual consistency.

However that's not really a real-world scenario.  Or if it is, Cassandra is
probably the wrong tool to satisfy that kind of workload.

On Thu, Mar 23, 2017 at 11:43 PM Alain Rastoul <alf.mmm....@gmail.com>
wrote:

On 24/03/2017 01:00, Eric Stevens wrote:
> Assuming an even distribution of data in your cluster, and an even
> distribution across those keys by your readers, you would not need to
> increase RF with cluster size to increase read performance.  If you have
> 3 nodes with RF=3, and do 3 million reads, with good distribution, each
> node has served 1 million read requests.  If you increase to 6 nodes and
> keep RF=3, then each node now owns half as much data and serves only
> 500,000 reads.  Or more meaningfully in the same time it takes to do 3
> million reads under the 3 node cluster you ought to be able to do 6
> million reads under the 6 node cluster since each node is just
> responsible for 1 million total reads.
>
Hi Eric,

I think I got your point.
In case of really evenly distributed  reads it may (or should?) not make
any difference,

But when you do not distribute well the reads (and in that case only),
my understanding about RF was that it could help spreading the load :
In that case, with RF= 4 instead of 3,  with several clients accessing keys
same key ranges, a coordinator could pick up one node to handle the request
in 4 replicas instead of picking up one node in 3 , thus having
more "workers" to handle a request ?

Am I wrong here ?

Thank you for the clarification


--
best,
Alain

Reply via email to