I'm still having tons of problems with repairs and compactions, where
the nodes
are declared dead in their log files, although they were online at all
times.
This leads to problem behavior, i.e. once again I see that repair fails,
and the cluster
becomes unusable since there is no space to com
Basically I tweaked the phi, put in more verbose GC reporting and
decided to do a compaction before I proceed. I'm getting this on the
node where compaction is being run. And the system log for the other two
nodes follows. It's obvious that the cluster is sick, but I
can't determine why -- ther
You can say the min compaction threshold to 2 and the max Compaction
Threshold to 3. If you have enough disk space for a few minor compaction
this should free up some disk space.
On Sun, Dec 4, 2011 at 7:17 PM, Peter Schuller
wrote:
> > As a side effect of the failed repair (so it seems) the disk
> As a side effect of the failed repair (so it seems) the disk usage on the
> affected node prevents compaction from working. It still works on
> the remaining nodes (we have 3 total).
> Is there a way to scrub the extraneous data?
This is one of the reasons why killing an in-process repair is a b
As a side effect of the failed repair (so it seems) the disk usage on the
affected node prevents compaction from working. It still works on
the remaining nodes (we have 3 total).
Is there a way to scrub the extraneous data?
Thanks
Maxim
On 12/4/2011 4:29 PM, Peter Schuller wrote:
I will try
> I will try to increase phi_convict -- I will just need to restart the
> cluster after
> the edit, right?
You will need to restart the nodes for which you want the phi convict
threshold to be different. You might want to do on e.g. half of the
cluster to do A/B testing.
> I do recall that I see
Please disregard the GC part of the question -- I found it.
On 12/4/2011 4:12 PM, Maxim Potekhin wrote:
Thanks Peter!
I will try to increase phi_convict -- I will just need to restart the
cluster after
the edit, right?
I do recall that I see nodes temporarily marked as down, only to pop
up
Thanks Peter!
I will try to increase phi_convict -- I will just need to restart the
cluster after
the edit, right?
I do recall that I see nodes temporarily marked as down, only to pop up
later.
In the current situation, there is no load on the cluster at all,
outside the
maintenance like
> I capped heap and the error is still there. So I keep seeing "node dead"
> messages even when I know the nodes were OK. Where and how do I tweak
> timeouts?
You can increase phi_convict_threshold in the configuration. However,
I would rather want to find out why they are being marked as down to
I capped heap and the error is still there. So I keep seeing "node dead"
messages even when I know the nodes were OK. Where and how do I tweak
timeouts?
9d-cfc9-4cbc-9f1d-1467341388b8, endpoint /130.199.185.193 died
INFO [GossipStage:1] 2011-12-04 00:26:16,362 Gossiper.java (line 683)
InetAddr
Thank you Peter. Before I look into details as you suggest,
may I ask what you mean "automatically restarted"? They way
the box and Cassandra are set up in my case is such that the
death of either if final.
Also, how do I look for full GC? I just realized that in the latest
install, I might have
Filed https://issues.apache.org/jira/browse/CASSANDRA-3569 to fix it
so that streams don't die due to conviction.
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
> quite understand how Cassandra declared a node dead (in the below). Was is a
> timeout? How do I fix that?
I was about to respond to say that repair doesn't fail just due to
failure detection, but this appears to have been broken by
CASSANDRA-2433 :(
Unless there is a subtle bug the exception y
Please help -- I've been having pretty consistent failures that look
like this one. Don't know how to proceed.
Below text comes from the system log. The cluster was all up before and
after the attempted repair, so I don't
quite understand how Cassandra declared a node dead (in the below). Was
is
14 matches
Mail list logo