AFAIK the MemtablePostFlusher is the TP writing sstables, if it has a queue 
then there is the potential for writes to block while it waits for Memtables to 
be flushed. Take a look at your Memtable settings per CF, could it be that all 
the Memtables are flushing at once? There is info in the logs about when this 
happens.

One approach is to set the timeout high, so they are more likely to flush due 
to ops or throughput. 

Aaron

On 19/02/2011, at 10:09 AM, Andy Skalet <aeska...@bitjug.com> wrote:

> On Thu, Feb 17, 2011 at 12:22 PM, Aaron Morton <aa...@thelastpickle.com> 
> wrote:
>> Messages been dropped means the machine node is overloaded. Look at the 
>> thread pool stats to see which thread pools have queues. It may be IO 
>> related, so also check the read and write latency on the CF and use iostat.
>> 
>> i would try those first, then jump into GC land.
> 
> Thanks, Aaron.  I am looking at the thread pool queues; not enough
> data on that yet but so far I've seen queues in the ReadStage from
> 4-30 (once 100) and MemtablePostFlusher as much as 70, though not 
> consistently.
> 
> The read latencies on the CFs on this cluster are sitting around
> 20-40ms, and the write latencies are are all around .01ms.  That seems
> good to me, but I don't have a baseline.
> 
> I do see high (90-100%) utilization from time to time on the disk that
> holds the data, based on reads.  This doesn't surprise me too much
> because IO on these machines is fairly limited in performance.
> 
> Does this sound like the node is overloaded?
> 
> Andy

Reply via email to