AFAIK there's nothing special being the first disk in the list. It's
fairly likely that you were just "lucky" that an important part of the
system keysapce was on that disk. It's always a good idea to keep some
spare hardware at hand, because you will never know when will they be
needed.
On 13/12/2021 18:01, Joe Obernberger wrote:
Thank you Bowen. I had the policy set to "best_effort", but as Jeff
pointed out since it was the first disk in the list that failed maybe
that is a special case?
I don't have a spare drive at the moment, so I'll just delete all the
cassandra data on that node and have it rejoin as a new node.
-Joe
On 12/11/2021 3:44 PM, Bowen Song wrote:
Hi Joe,
In case of a single disk failure, you should not remove the data
directory from the cassandra.yaml file. Instead, you should replace
the failed disk with a new empty disk. See
https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsRecoverUsingJBOD.html
for the steps.
Since your node failed to start, I guess it's not too late to restore
the settings in the cassandra.yaml file and then follow the above
steps. However, replacing the entire node is always an option if
everything else has failed, as long as you have RF>1 and other nodes
in the cluster are all healthy. If you need to do this, follow the
steps here:
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsReplaceNode.html
As of your last question,
> /When a drive fails with cassandra, is it common for the node to
come down? /
this actually depends on the disk_failure_policy in your
cassandra.yaml file, read the comments in it will help you understand
the available choices.
Cheers,
Bowen
On 06/12/2021 14:11, Joe Obernberger wrote:
Hi All - one node in an 11 node cluster experienced a drive failure
on the first drive in the list. I removed that drive from the list
so that it now reads:
data_file_directories:
- /data/2/cassandra/data
- /data/3/cassandra/data
- /data/4/cassandra/data
- /data/5/cassandra/data
- /data/6/cassandra/data
- /data/8/cassandra/data
- /data/9/cassandra/data
But when I try to start the server, I get:
Exception (java.lang.RuntimeException) encountered during startup: A
node with address /172.16.100.251:7000 already exists, cancelling
join. Use cassandra.replace_address if you want to replace this node.
java.lang.RuntimeException: A node with address /172.16.100.251:7000
already exists, cancelling join. Use cassandra.replace_address if
you want to replace this node.
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
at
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
ERROR [main] 2021-12-05 15:49:48,446 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: A node with address /172.16.100.251:7000
already exists, cancelling join. Use cassandra.replace_address if
you want to replace this node.
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
at
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
INFO [StorageServiceShutdownHook] 2021-12-05 15:49:48,468
HintsService.java:220 - Paused hints dispatch
WARN [StorageServiceShutdownHook] 2021-12-05 15:49:48,470
Gossiper.java:1993 - No local state, state is in silent shutdown, or
node hasn't joined, not announcing shutdown
Do I need to remove and re-add the node? When a drive fails with
cassandra, is it common for the node to come down?
Thank you!
-Joe Obernberger
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
Virus-free. www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>