Hi folks, hopefully a quick one: We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's all in one region but spread across 3 availability zones. It was nicely balanced with 4 nodes in each.
But with a couple of failures and subsequent provisions to the wrong az we now have a cluster with : 5 nodes in az A 5 nodes in az B 2 nodes in az C Not sure why, but when adding a third node in AZ C it fails to stream after getting all the way to completion and no apparent error in logs. I've looked at a couple of bugs referring to scrubbing and possible OOM bugs due to metadata writing at end of streaming (sorry don't have ticket handy). I'm worried I might not be able to do much with these since the disk space usage is high and they are under a lot of load given the small number of them for this rack. Rather than troubleshoot this further, what I was thinking about doing was: - drop the replication factor on our keyspace to two - hopefully this would reduce load on these two remaining nodes - run repairs/cleanup across the cluster - then shoot these two nodes in the 'c' rack - run repairs/cleanup across the cluster Would this work with minimal/no disruption? Should I update their "rack" before hand or after ? What else am I not thinking about? My main goal atm is to get back to where the cluster is in a clean consistent state that allows nodes to properly bootstrap. Thanks for your help in advance. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org