FWIW, `nodetool decommission` is strongly preferred. `nodetool removenode` is designed to be run when a host is offline. Only decommission is guaranteed to maintain consistency / correctness, and removemode probably streams a lot more data around than decommission.
On Mon, Apr 3, 2023 at 6:47 AM Bowen Song via user < user@cassandra.apache.org> wrote: > Use nodetool removenode is strongly preferred in most circumstances, and > only resort to assassinate if you do not care about data consistency or > you know there won't be any consistency issue (e.g. no new writes and did > not run nodetool cleanup). > > Since the size of data on the new node is small, nodetool removenode > should finish fairly quickly and bring your cluster back. > > Next time when you are doing something like this again, please test it out > on a non-production environment, make sure everything works as expected > before moving onto the production. > > > On 03/04/2023 06:28, David Tinker wrote: > > Should I use assassinate or removenode? Given that there is some data on > the node. Or will that be found on the other nodes? Sorry for all the > questions but I really don't want to mess up. > > On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz <crdiaz...@gmail.com> wrote: > >> That's what nodetool assassinte will do. >> >> On Sun, Apr 2, 2023 at 10:19 PM David Tinker <david.tin...@gmail.com> >> wrote: >> >>> Is it possible for me to remove the node from the cluster i.e. to undo >>> this mess and get the cluster operating again? >>> >>> On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz <crdiaz...@gmail.com> wrote: >>> >>>> You can leave it in the seed list of the other nodes, just make sure >>>> it's not included in this node's seed list. However, if you do decide to >>>> fix the issue with the racks first assassinate this node (nodetool >>>> assassinate <ip>), and update the rack name before you restart. >>>> >>>> On Sun, Apr 2, 2023 at 10:06 PM David Tinker <david.tin...@gmail.com> >>>> wrote: >>>> >>>>> It is also in the seeds list for the other nodes. Should I remove it >>>>> from those, restart them one at a time, then restart it? >>>>> >>>>> /etc/cassandra # grep -i bootstrap * >>>>> doesn't show anything so I don't think I have auto_bootstrap false. >>>>> >>>>> Thanks very much for the help. >>>>> >>>>> >>>>> On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz <crdiaz...@gmail.com> >>>>> wrote: >>>>> >>>>>> Just remove it from the seed list in the cassandra.yaml file and >>>>>> restart the node. Make sure that auto_bootstrap is set to true first >>>>>> though. >>>>>> >>>>>> On Sun, Apr 2, 2023 at 9:59 PM David Tinker <david.tin...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> So likely because I made it a seed node when I added it to the >>>>>>> cluster it didn't do the bootstrap process. How can I recover this? >>>>>>> >>>>>>> On Mon, Apr 3, 2023 at 6:41 AM David Tinker <david.tin...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Yes replication factor is 3. >>>>>>>> >>>>>>>> I ran nodetool repair -pr on all the nodes (one at a time) and am >>>>>>>> still having issues getting data back from queries. >>>>>>>> >>>>>>>> I did make the new node a seed node. >>>>>>>> >>>>>>>> Re "rack4": I assumed that was just an indication as to the >>>>>>>> physical location of the server for redundancy. This one is separate >>>>>>>> from >>>>>>>> the others so I used rack4. >>>>>>>> >>>>>>>> On Mon, Apr 3, 2023 at 6:30 AM Carlos Diaz <crdiaz...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I'm assuming that your replication factor is 3. If that's the >>>>>>>>> case, did you intentionally put this node in rack 4? Typically, you >>>>>>>>> want >>>>>>>>> to add nodes in multiples of your replication factor in order to keep >>>>>>>>> the >>>>>>>>> "racks" balanced. In other words, this node should have been added >>>>>>>>> to rack >>>>>>>>> 1, 2 or 3. >>>>>>>>> >>>>>>>>> Having said that, you should be able to easily fix your problem by >>>>>>>>> running a nodetool repair -pr on the new node. >>>>>>>>> >>>>>>>>> On Sun, Apr 2, 2023 at 8:16 PM David Tinker < >>>>>>>>> david.tin...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi All >>>>>>>>>> >>>>>>>>>> I recently added a node to my 3 node Cassandra 4.0.5 cluster and >>>>>>>>>> now many reads are not returning rows! What do I need to do to fix >>>>>>>>>> this? >>>>>>>>>> There weren't any errors in the logs or other problems that I could >>>>>>>>>> see. I >>>>>>>>>> expected the cluster to balance itself but this hasn't happened >>>>>>>>>> (yet?). The >>>>>>>>>> nodes are similar so I have num_tokens=256 for each. I am using the >>>>>>>>>> Murmur3Partitioner. >>>>>>>>>> >>>>>>>>>> # nodetool status >>>>>>>>>> Datacenter: dc1 >>>>>>>>>> =============== >>>>>>>>>> Status=Up/Down >>>>>>>>>> |/ State=Normal/Leaving/Joining/Moving >>>>>>>>>> -- Address Load Tokens Owns (effective) Host ID >>>>>>>>>> Rack >>>>>>>>>> UN xxx.xxx.xxx.105 2.65 TiB 256 72.9% >>>>>>>>>> afd02287-3f88-4c6f-8b27-06f7a8192402 rack3 >>>>>>>>>> UN xxx.xxx.xxx.253 2.6 TiB 256 73.9% >>>>>>>>>> e1af72be-e5df-4c6b-a124-c7bc48c6602a rack2 >>>>>>>>>> UN xxx.xxx.xxx.24 93.82 KiB 256 80.0% >>>>>>>>>> c4e8b4a0-f014-45e6-afb4-648aad4f8500 rack4 >>>>>>>>>> UN xxx.xxx.xxx.107 2.65 TiB 256 73.2% >>>>>>>>>> ab72f017-be96-41d2-9bef-a551dec2c7b5 rack1 >>>>>>>>>> >>>>>>>>>> # nodetool netstats >>>>>>>>>> Mode: NORMAL >>>>>>>>>> Not sending any streams. >>>>>>>>>> Read Repair Statistics: >>>>>>>>>> Attempted: 0 >>>>>>>>>> Mismatch (Blocking): 0 >>>>>>>>>> Mismatch (Background): 0 >>>>>>>>>> Pool Name Active Pending Completed >>>>>>>>>> Dropped >>>>>>>>>> Large messages n/a 0 71754 >>>>>>>>>> 0 >>>>>>>>>> Small messages n/a 0 8398184 >>>>>>>>>> 14 >>>>>>>>>> Gossip messages n/a 0 1303634 >>>>>>>>>> 0 >>>>>>>>>> >>>>>>>>>> # nodetool ring >>>>>>>>>> Datacenter: dc1 >>>>>>>>>> ========== >>>>>>>>>> Address Rack Status State Load >>>>>>>>>> Owns Token >>>>>>>>>> >>>>>>>>>> 9189523899826545641 >>>>>>>>>> xxx.xxx.xxx.24 rack4 Up Normal 93.82 KiB >>>>>>>>>> 79.95% -9194674091837769168 >>>>>>>>>> xxx.xxx.xxx.107 rack1 Up Normal 2.65 TiB >>>>>>>>>> 73.25% -9168781258594813088 >>>>>>>>>> xxx.xxx.xxx.253 rack2 Up Normal 2.6 TiB >>>>>>>>>> 73.92% -9163037340977721917 >>>>>>>>>> xxx.xxx.xxx.105 rack3 Up Normal 2.65 TiB >>>>>>>>>> 72.88% -9148860739730046229 >>>>>>>>>> xxx.xxx.xxx.107 rack1 Up Normal 2.65 TiB >>>>>>>>>> 73.25% -9125240034139323535 >>>>>>>>>> xxx.xxx.xxx.253 rack2 Up Normal 2.6 TiB >>>>>>>>>> 73.92% -9112518853051755414 >>>>>>>>>> xxx.xxx.xxx.105 rack3 Up Normal 2.65 TiB >>>>>>>>>> 72.88% -9100516173422432134 >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> This is causing a serious production issue. Please help if you >>>>>>>>>> can. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>