AFAIK the MemtablePostFlusher is the TP writing sstables, if it has a queue then there is the potential for writes to block while it waits for Memtables to be flushed. Take a look at your Memtable settings per CF, could it be that all the Memtables are flushing at once? There is info in the logs about when this happens.
One approach is to set the timeout high, so they are more likely to flush due to ops or throughput. Aaron On 19/02/2011, at 10:09 AM, Andy Skalet <aeska...@bitjug.com> wrote: > On Thu, Feb 17, 2011 at 12:22 PM, Aaron Morton <aa...@thelastpickle.com> > wrote: >> Messages been dropped means the machine node is overloaded. Look at the >> thread pool stats to see which thread pools have queues. It may be IO >> related, so also check the read and write latency on the CF and use iostat. >> >> i would try those first, then jump into GC land. > > Thanks, Aaron. I am looking at the thread pool queues; not enough > data on that yet but so far I've seen queues in the ReadStage from > 4-30 (once 100) and MemtablePostFlusher as much as 70, though not > consistently. > > The read latencies on the CFs on this cluster are sitting around > 20-40ms, and the write latencies are are all around .01ms. That seems > good to me, but I don't have a baseline. > > I do see high (90-100%) utilization from time to time on the disk that > holds the data, based on reads. This doesn't surprise me too much > because IO on these machines is fairly limited in performance. > > Does this sound like the node is overloaded? > > Andy