RE: Cassandra datacenters replication advanced usage

Fabrice Douchant Tue, 02 Jun 2015 04:43:33 -0700

Hello Marcus and thank you for your fast reply.

Yes, we thought about that and indeed it would work. However we really have 
writes and reads constraints for respectively producer and consumer datacenters 
so we would like to keep all/most access "local".

We don't need synchronization between datacenters to be fast, we just need to 
know when it's done :-/

Fabrice

From: Marcus Olsson [mailto:marcus.ols...@ericsson.com]
Sent: mardi 2 juin 2015 13:29
To: user@cassandra.apache.org
Subject: Re: Cassandra datacenters replication advanced usage

Hi Fabrice,

Have you considered using "each_quorum" instead of "all"?

Each_quorum will require replies from a quorum of nodes from all datacenters.

This could be used either:
Producer using each_quorum and consumer local_quroum. (better read latencies at 
the cost of write latencies)

or

Producer using local_quorum and consumer each_quorum. (better write latencies 
at the cost of read latencies)

BR
Marcus Olsson
On 06/02/2015 01:00 PM, Fabrice Douchant wrote:
Hi everyone.

For a project, we use a Cassandra cluster in order to have fast reads/writes on 
a large number of (column oriented) generated data.

Until now, we only had 1 datacenter for prototyping.

We now plan to split our cluster in 2 datacenters to meet performance 
requirements (the data transfer between both datacenter is quite slow):

datacenter #1 : located near our data producer services : intensively writes 
all data in Cassandra periodically (each writes has a "run_id" column in its 
primary key)
datacenter #2 : located near our data consumer services: intensively reads all 
data produced by datacenter #1 for a given "run _id".
However, we would like our consumer services to access data only in the 
datacenter near them (datacenter #2) and when all data for a given "run_id" 
have been completely replicated from datacenter #1 (data generated by the 
producer services).

My question is : how can we ensure that all data have been replicated in 
datancenter #2 before telling producer services (near datacenter #2) to start 
using them ?

Our best solutions so far (but still not good enough :-P):

producer services (datacenter #1) writes in consistency "all". But this leads 
to poor partitioning failure tolerance AND really bad writes performances.
producer services (datacenter #1) writes in consistency "local_quorum" and a 
last "run finished" value could be written in consistency "all". But it seems 
Cassandra does not ensure replication ordering.
Do you have any suggestion ?

Thanks a lot,

Fabrice

RE: Cassandra datacenters replication advanced usage

Reply via email to