Hello Aaron,
it's probably the over-optimistic number of concurrent compactors that
was tripping the system.
I do not entirely understand what's the correlation here, maybe it's
that the compactors were overloading
the neighboring nodes causing time-outs. I tuned the concurrency down
and aft
At some point the gossip system on the node this log is from decided that
130.199.185.195 was DOWN. This was based on how often the node was gossiping to
the cluster.
The active repair session was informed. And to avoid failing the job
unnecessarily it tested that the errant nodes phi value wa
Server log below. Mind you that all the nodes are still up -- even
though reported as "dead" in this log.
What's going on here?
Thanks!
INFO [GossipTasks:1] 2012-04-18 22:18:26,487 Gossiper.java (line 719)
InetAddress /130.199.185.193 is now dead.
INFO [ScheduledTasks:1] 2012-04-18 22:18:26,
Look at the server side logs for errors.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 13/04/2012, at 11:47 AM, Maxim Potekhin wrote:
> Hello,
>
> I'm doing compactions under 0.8.8.
>
> Recently, I started seeing a stack trace like one
Hello,
I'm doing compactions under 0.8.8.
Recently, I started seeing a stack trace like one below, and I can't
figure out what causes this to appear.
The cluster has been in operation for mode than half a year w/o errors
like this one.
Any help will be appreciated,
Thanks
Maxim
WARNING: F