I'm testing various scenarios in a multi data center configuration. The setup 
is 10 Cassandra 1.1.5 nodes configured into two data centers, 5 nodes in each 
DC (RF DC1:3,DC2:3, write consistency LOCAL_QUORUM). I have a synthetic random 
data generator that I can run, and each run adds roughly 1GiB of data to each 
node per run,

DC          Rack        Status State   Load            Effective-Ownership
                                                                          
DC1         RAC1        Up     Normal  1010.71 MB      60.00%             
DC2         RAC1        Up     Normal  1009.08 MB      60.00%             
DC1         RAC1        Up     Normal  1.01 GB         60.00%             
DC2         RAC1        Up     Normal  1 GB            60.00%             
DC1         RAC1        Up     Normal  1.01 GB         60.00%             
DC2         RAC1        Up     Normal  1014.45 MB      60.00%             
DC1         RAC1        Up     Normal  1.01 GB         60.00%             
DC2         RAC1        Up     Normal  1.01 GB         60.00%             
DC1         RAC1        Up     Normal  1.01 GB         60.00%             
DC2         RAC1        Up     Normal  1.01 GB         60.00%             

Now, if I kill all the nodes in DC2, and run the data generator again, I would 
expect roughly 2GiB to be added to each node in DC1 (local replicas + hints to 
other data center), instead I get this:

DC          Rack        Status State   Load            Effective-Ownership
                                                                          
DC1         RAC1        Up     Normal  17.56 GB        60.00%             
DC2         RAC1        Down   Normal  1009.08 MB      60.00%             
DC1         RAC1        Up     Normal  17.47 GB        60.00%             
DC2         RAC1        Down   Normal  1 GB            60.00%             
DC1         RAC1        Up     Normal  17.22 GB        60.00%             
DC2         RAC1        Down   Normal  1014.45 MB      60.00%             
DC1         RAC1        Up     Normal  16.94 GB        60.00%             
DC2         RAC1        Down   Normal  1.01 GB         60.00%             
DC1         RAC1        Up     Normal  17.26 GB        60.00%             
DC2         RAC1        Down   Normal  1.01 GB         60.00%             

Checking the sstables on a node reveals this,

-bash-3.2$ du -hs HintsColumnFamily/
16G     HintsColumnFamily/
-bash-3.2$

So it seems that what I would have expected to be 1GiB of hints is much larger 
in reality, a 15x-16x inflation. This has a huge impact on write performance as 
well.

If I bring DC2 up again, eventually the load will drop down and even out to 
2GiB across the entire cluster.

I'm wondering if this inflation is intended or if it is possibly a bug or 
something I'm doing wrong? Assuming this inflation is correct, what is the best 
way to deal with temporary connectivity issues with a second data center? Write 
performance is paramount in my use case. A 2x-3x overhead is doable, but not 
15x-16x.

Thanks,
/dml


Reply via email to