AFAIK there's nothing special being the first disk in the list. It's fairly likely that you were just "lucky" that an important part of the system keysapce was on that disk. It's always a good idea to keep some spare hardware at hand, because you will never know when will they be needed.

On 13/12/2021 18:01, Joe Obernberger wrote:

Thank you Bowen.  I had the policy set to "best_effort", but as Jeff pointed out since it was the first disk in the list that failed maybe that is a special case?

I don't have a spare drive at the moment, so I'll just delete all the cassandra data on that node and have it rejoin as a new node.

-Joe

On 12/11/2021 3:44 PM, Bowen Song wrote:

Hi Joe,

In case of a single disk failure, you should not remove the data directory from the cassandra.yaml file. Instead, you should replace the failed disk with a new empty disk. See https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsRecoverUsingJBOD.html for the steps.

Since your node failed to start, I guess it's not too late to restore the settings in the cassandra.yaml file and then follow the above steps. However, replacing the entire node is always an option if everything else has failed, as long as you have RF>1 and other nodes in the cluster are all healthy. If you need to do this, follow the steps here: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsReplaceNode.html

As of your last question,

> /When a drive fails with cassandra, is it common for the node to come down? /

this actually depends on the disk_failure_policy in your cassandra.yaml file, read the comments in it will help you understand the available choices.

Cheers,
Bowen

On 06/12/2021 14:11, Joe Obernberger wrote:
Hi All - one node in an 11 node cluster experienced a drive failure on the first drive in the list.  I removed that drive from the list so that it now reads:

data_file_directories:
    - /data/2/cassandra/data
    - /data/3/cassandra/data
    - /data/4/cassandra/data
    - /data/5/cassandra/data
    - /data/6/cassandra/data
    - /data/8/cassandra/data
    - /data/9/cassandra/data

But when I try to start the server, I get:

Exception (java.lang.RuntimeException) encountered during startup: A node with address /172.16.100.251:7000 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node. java.lang.RuntimeException: A node with address /172.16.100.251:7000 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887) ERROR [main] 2021-12-05 15:49:48,446 CassandraDaemon.java:909 - Exception encountered during startup java.lang.RuntimeException: A node with address /172.16.100.251:7000 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887) INFO  [StorageServiceShutdownHook] 2021-12-05 15:49:48,468 HintsService.java:220 - Paused hints dispatch WARN  [StorageServiceShutdownHook] 2021-12-05 15:49:48,470 Gossiper.java:1993 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown

Do I need to remove and re-add the node?  When a drive fails with cassandra, is it common for the node to come down?

Thank you!

-Joe Obernberger


<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> Virus-free. www.avg.com <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Reply via email to