Thanks guys! Jeff Jirsa helped me take a look, and I found a 10sec young gc
pause in the GC log.
3071128K->282000K(3495296K), 0.1144648 secs]
25943529K->23186623K(66409856K), 9.8971781 secs] [Times: user=2.33
sys=0.00, real=9.89 secs]
I'm trying to get a histogram or heap dump.
Thanks!
On Mo
The lion's share of your drops are from cross-node timeouts, which require
clock synchronization, so check that first. If your clocks are synced,
that means not only are you showing eager dropping based on time, but
despite the eager dropping you are still facing overload.
That local, non-gc paus
Dikang,
Did you take a look at the heap health on those nodes? A quick heap
histogram or dump would help you figure out if it is related to data
issue(wide rows, or bad model) where few nodes may be coming under heap
pressure and dropping messages.
Thanks,
Roopa
*Regards,*
*Roopa Tangirala*
Hi Dikang,
Do you have any GC logging or metrics you can correlate with the dropped
messages? A 13 second pause sounds like a bad GC pause.
Thanks,
Blake
On January 22, 2017 at 10:37:22 PM, Dikang Gu (dikan...@gmail.com) wrote:
Btw, the C* version is 2.2.5, with several backported patches.
Btw, the C* version is 2.2.5, with several backported patches.
On Sun, Jan 22, 2017 at 10:36 PM, Dikang Gu wrote:
> Hello there,
>
> We have a 100 nodes ish cluster, I find that there are dropped messages on
> random nodes in the cluster, which caused error spikes and P99 latency
> spikes as wel