Re: node replacement failed

onmstester onmstester Sat, 22 Sep 2018 07:33:45 -0700
Another question, Is there a management tool to do nodetool cleanup one by one 
(wait until finish of cleaning up one node then start clean up for the next 
node in cluster)? ---- On Sat, 22 Sep 2018 16:02:17 +0330 onmstester onmstester 
<[email protected]> wrote ---- I have a cunning plan (Baldrick wise) to solve 
this problem: stop client application run nodetool flush on all nodes to save 
memtables to disk stop cassandra on all of the nodes rename original Cassandra 
data directory to data-old start cassandra on all the nodes to create a fresh 
cluster including the old dead nodes again create the application related 
keyspaces in cqlsh and this time set rf=2 on system keyspaces (to never 
encounter this problem again!) move sstables from data-backup dir to current 
data dirs and restart cassandra or reload sstables Should this work and solve 
my problem? ---- On Mon, 10 Sep 2018 17:12:48 +0430 onmstester onmstester 
<[email protected]> wrote ---- Thanks Alain, First here it is more detail 
about my cluster: 10 racks + 3 nodes on each rack nodetool status: shows 27 
nodes UN and 3 nodes all related to single rack as DN version 3.11.2 Option 1: 
(Change schema and) use replace method (preferred method) * Did you try to have 
the replace going, without any former repairs, ignoring the fact 
'system_traces' might be inconsistent? You probably don't care about this 
table, so if Cassandra allows it with some of the nodes down, going this way is 
relatively safe probably. I really do not see what you could lose that matters 
in this table. * Another option, if the schema first change was accepted, is to 
make the second one, to drop this table. You can always rebuild it in case you 
need it I assume. I really love to let the replace going, but it stops with the 
error: java.lang.IllegalStateException: unable to find sufficient sources for 
streaming range in keyspace system_traces Also i could delete system_traces 
which is empty anyway, but there is a system_auth and system_distributed 
keyspace too and they are not empty, Could i delete them safely too? If i could 
just somehow skip streaming the system keyspaces from node replace phase, the 
option 1 would be great. P.S: Its clear to me that i should use at least RF=3 
in production, but could not manage to acquire enough resources yet (i hope 
would be fixed in recent future) Again Thank you for your time Sent using Zoho 
Mail ---- On Mon, 10 Sep 2018 16:20:10 +0430 Alain RODRIGUEZ 
<[email protected]> wrote ---- Hello, I am sorry it took us (the community) 
more than a day to answer to this rather critical situation. That being said, 
my recommendation at this point would be for you to make sure about the impacts 
of whatever you would try. Working on a broken cluster, as an emergency might 
lead you to a second mistake, possibly more destructive than the first one. It 
happened to me and around, for many clusters. Move forward even more carefuly 
in these situations as a global advice. Suddenly i lost all disks of 
cassandar-data on one of my racks With RF=2, I guess operations use LOCAL_ONE 
consistency, thus you should have all the data in the safe rack(s) with your 
configuration, you probably did not lose anything yet and have the service only 
using the nodes up, that got the right data.  tried to replace the nodes with 
same ip using this: 
https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
 As a side note, I would recommend you to use 'replace_address_first_boot' 
instead of 'replace_address'. This does basically the same but will be ignored 
after the first bootstrap. A detail, but hey, it's there and somewhat safer, I 
would use this one. java.lang.IllegalStateException: unable to find sufficient 
sources for streaming range in keyspace system_traces By default, non-user 
keyspace use 'SimpleStrategy' and a small RF. Ideally, this should be changed 
in a production cluster, and you're having an example of why. Now when i 
altered the system_traces keyspace startegy to NetworkTopologyStrategy and RF=2 
but then running nodetool repair failed: Endpoint not alive /IP of dead node 
that i'm trying to replace. Changing the replication strategy you made the dead 
rack owner of part of the token ranges, thus repairs just can't work as there 
will always be one of the nodes involved down as the whole rack is down. Repair 
won't work, but you probably do not need it! 'system_traces' is a temporary / 
debug table. It's probably empty or with irrelevant data. Here are some 
thoughts: * It would be awesome at this point for us (and for you if you did 
not) to see the status of the cluster: ** 'nodetool status' ** 'nodetool 
describecluster' --> This one will tell if the nodes agree on the schema (nodes 
up). I have seen schema changes with nodes down inducing some issues. ** 
Cassandra version ** Number of racks (I assumer #racks >= 2 in this email) 
Option 1: (Change schema and) use replace method (preferred method) * Did you 
try to have the replace going, without any former repairs, ignoring the fact 
'system_traces' might be inconsistent? You probably don't care about this 
table, so if Cassandra allows it with some of the nodes down, going this way is 
relatively safe probably. I really do not see what you could lose that matters 
in this table. * Another option, if the schema first change was accepted, is to 
make the second one, to drop this table. You can always rebuild it in case you 
need it I assume. Option 2: Remove all the dead nodes (try to avoid this option 
2, if option 1 works, it is better). Please do not take an apply this like 
this. It's a thought on how you could get rid of the issue, yet it's rather 
brutal and risky and I did not consider it deeply and have no clue about your 
architecture and the context. Consider it carefully on your side. * You can 
also 'nodetool removenode' on each of the dead nodes. This will have nodes 
streaming around and the rack isolation guarantee will no longer be valid. It's 
hard to reason about what would happen to the data and in terms of streaming. * 
Alternatively, if you don't have enough space, you can even 'force' the 
'nodetool removenode'. See the documentation. Forcing it will prevent streaming 
and remove the node (token ranges handover, but not the data). If that does not 
work you can use the 'nodetool assassinate' command as well. When adding nodes 
back to the broken DC, the first nodes will take probably 100% of the 
ownership, which is often too much. You can consider adding back all the nodes 
with 'auto_bootstrap: false' before repairing them once they have their final 
token ownership, the same ways we do when building a new data center. This 
option is not really clean, and have some caveats that you need to consider 
before starting as there are token range movements and nodes available that do 
not have the data. Yet this should work. I imagine it would work nicely with 
RF=3 and QUORUM and with RF=2 (if you have 2+ racks), I guess it should work as 
well but you will have to pick one of availability or consistency while 
repairing the data. Be aware that read requests hitting these nodes will not 
find data! Plus, you are using an RF=2. Thus using consistency of 2+ (TWO, 
QUORUM, ALL), for at least one of reads or writes is needed to preserve 
consistency while re-adding the nodes in this case. Otherwise, reads will not 
detect the mismatch with certainty and might show inconsistent data the time 
for the nodes to be repaired. I must say, that I really prefer odd values for 
the RF, starting with RF=3. Using RF=2 you will have to pick. Consistency or 
Availability. With a consistency of ONE everywhere, the service is available, 
no single point of failure. using anything bigger than this, for writes or 
read, brings consistency but it creates single points of failures (actually any 
node becomes a point of failure). RF=3 and QUORUM for both write and reads take 
the best of the 2 worlds somehow. The tradeoff with RF=3 and quorum reads is 
the latency increase and the resource usage. Maybe is there a better approach, 
I am not too sure, but I think I would try option 1 first in any case. It's 
less destructive, less risky, no token range movements, no empty nodes 
available. I am not sure about limitation you might face though and that's why 
I suggest a second option for you to consider if the first is not actionable. 
Let us know how it goes, C*heers, ----------------------- Alain Rodriguez - 
@arodream - [email protected] France / Spain The Last Pickle - Apache 
Cassandra Consulting http://www.thelastpickle.com Le lun. 10 sept. 2018 à 
09:09, onmstester onmstester <[email protected]> a écrit : Any idea? Sent 
using Zoho Mail ---- On Sun, 09 Sep 2018 11:23:17 +0430 onmstester onmstester 
<[email protected]> wrote ---- Hi, Cluster Spec: 30 nodes RF = 2 
NetworkTopologyStrategy GossipingPropertyFileSnitch + rack aware Suddenly i 
lost all disks of cassandar-data on one of my racks, after replacing the disks, 
tried to replace the nodes with same ip using this: 
https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
 starting the to-be-replace-node fails with: java.lang.IllegalStateException: 
unable to find sufficient sources for streaming range in keyspace system_traces 
the problem is that i did not changed default replication config for System 
keyspaces, but Now when i altered the system_traces keyspace startegy to 
NetworkTopologyStrategy and RF=2 but then running nodetool repair failed: 
Endpoint not alive /IP of dead node that i'm trying to replace. What should i 
do now? Can i just remove previous nodes, change dead nodes IPs and re-join 
them to cluster? Sent using Zoho Mail
Re: node replacement failed

Reply via email to