On Thu, Feb 17, 2011 at 12:22 PM, Aaron Morton <aa...@thelastpickle.com> wrote: > Messages been dropped means the machine node is overloaded. Look at the > thread pool stats to see which thread pools have queues. It may be IO > related, so also check the read and write latency on the CF and use iostat. > > i would try those first, then jump into GC land.
Thanks, Aaron. I am looking at the thread pool queues; not enough data on that yet but so far I've seen queues in the ReadStage from 4-30 (once 100) and MemtablePostFlusher as much as 70, though not consistently. The read latencies on the CFs on this cluster are sitting around 20-40ms, and the write latencies are are all around .01ms. That seems good to me, but I don't have a baseline. I do see high (90-100%) utilization from time to time on the disk that holds the data, based on reads. This doesn't surprise me too much because IO on these machines is fairly limited in performance. Does this sound like the node is overloaded? Andy