Hi:
     These days I found my Cassandra is strange, much slower than before.  And 
I Spent much time to figure it out and today I got the answer.

    Some bad buy keeps on writing many data day and night, then made a very big 
row mutation which size is about 140M.
In this period I restarted some Cassandra nodes, and when the nodes is alive 
again, them got some hintedhandoff messages.
HintedHandOffManager.sendMessage() will send the rowmutations to these nodes, 
but the rowmutation is too big to finish transferring in
8 seconds (defined in DatabaseDescriptor.getRpcTimeout()), and sendMessage() 
return false when got a TimeoutException.

Every one hour HintedHandOffManager will check hintedhandoff ColumnFamily then 
send out the big rowmutations to alive nodes,
It fails again because of the TimeoutException, so the task will never finish 
and the big rowmutation is sending again and again.

   In multi-datacenters,  a big rowmutation can not be transferred in several 
seconds. so It is a potential risk when  a big rowmutation occurs.



Luke

Reply via email to