Re: Heavy writes ok for single node, but failed for cluster

Sylvain Lebresne Wed, 27 Apr 2011 02:51:14 -0700

On Wed, Apr 27, 2011 at 10:32 AM, Sheng Chen <chensheng2...@gmail.com> wrote:
> I succeeded to insert 1 billion records into a single node cassandra,
>>> bin/stress -d cas01 -o insert -n 1000000000 -c 5 -S 34 -C5 -t 20
> Inserts finished in about 14 hours at a speed of 20k/sec.
> But when I added another node, tests always failed with UnavailableException
> in an hour.
>>> bin/stress -d cas01,cas02 -o insert -n 1000000000 -c 5 -S 34 -C5 -t 20
> Writes speed is also 20k/sec because of the bottleneck in the client, so the
> pressure on each server node should be 50% of the single node test.
> Why couldn't they handle?
> By default, rf=1, consistency=ONE
> Some information that may be helpful,
> 1. no warn/error in log file, the cluster is still alive after those
> exception
> 2. the last logs on both nodes happen to be a compaction complete info
> 3. gossip log shows one node is dead and then up again in 3 seconds


That's your problem. Once marked down (and since rf=1), when an update for
cas02 reach cas01 and cas01 has marked cas02 down, it will throw the
UnavailableException.

Now, it shouldn't have been marked down and I suspect this is due to
https://issues.apache.org/jira/browse/CASSANDRA-2554
(even though you didn't tell which version you're using, I suppose
this is a 0.7.*).

If you apply this patch or use the svn current 0.7 branch, that should hopefully
not happen again.

Note that if you had rf >= 2, the node would still have been marked down wrongly
for 3 seconds, but that would have been transparent to the stress test.

> 4. I set hinted_handoff_enabled: false, but still see lots of handoff logs

What are those saying ?

--
Sylvain

Re: Heavy writes ok for single node, but failed for cluster

Reply via email to