As far as I know, the nodetool stopdaemon is doing a "kill -9". Or did it change?
2017-10-12 23:49 GMT-03:00 Anshu Vajpayee <anshu.vajpa...@gmail.com>: > Why are you killing when we have nodetool stopdaemon ? > > On Fri, Oct 13, 2017 at 1:49 AM, Javier Canillas < > javier.canil...@gmail.com> wrote: > >> That's what I thought. >> >> Thanks! >> >> 2017-10-12 14:26 GMT-03:00 Hannu Kröger <hkro...@gmail.com>: >> >>> Hi, >>> >>> Drain should be enough. It stops accepting writes and after that >>> cassandra can be safely shut down. >>> >>> Hannu >>> >>> On 12 October 2017 at 20:24:41, Javier Canillas ( >>> javier.canil...@gmail.com) wrote: >>> >>> Hello everyone, >>> >>> I have some time working with Cassandra, but every time I need to >>> shutdown a node (for any reason like upgrading version or moving instance >>> to another host) I see several errors on the client applications (yes, I'm >>> using the official java driver). >>> >>> By the way, I'm starting C* as a stand-alone process >>> <https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/referenceStartCprocess.html?hl=start>, >>> and C* version is 3.11.0. >>> >>> The way I have implemented the shutdown process is something like the >>> following: >>> >>> *# Drain all information from commitlog into sstables* >>> >>> *bin/nodetool drain* >>> >>> >>> *cassandra_pid=`ps -ef|grep "java.*apache-cassandra"|grep -v "grep"|awk >>> '{print $2}'`* >>> *if [ ! -z "$cassandra_pid" ] && [ "$cassandra_pid" -ne "1" ]; then* >>> * echo "Asking Cassandra to shutdown (nodetool drain doesn't stop >>> cassandra)"* >>> * kill $cassandra_pid* >>> >>> * echo -n "+ Checking it is down. "* >>> * counter=10* >>> * while [ "$counter" -ne 0 -a ! kill -0 $cassandra_pid > >>> /dev/null 2>&1 ]* >>> * do* >>> * echo -n ". "* >>> * ((counter--))* >>> * sleep 1s* >>> * done* >>> * echo ""* >>> * if ! kill -0 $cassandra_pid > /dev/null 2>&1; then* >>> * echo "+ Its down."* >>> * else* >>> * echo "- Killing Cassandra."* >>> * kill -9 $cassandra_pid* >>> * fi* >>> *else* >>> * echo "Care there was a problem finding Cassandra PID"* >>> *fi* >>> >>> Should I add at the beginning the following lines? >>> >>> echo "shutdowing cassandra gracefully with: nodetool disable gossip" >>> $CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool disablegossip >>> echo "shutdowing cassandra gracefully with: nodetool disable binary >>> protocol" >>> $CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool disablebinary >>> echo "shutdowing cassandra gracefully with: nodetool thrift" >>> $CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool disablethrift >>> >>> The shutdown log is the following: >>> >>> *WARN [RMI TCP Connection(10)-127.0.0.1] 2017-10-12 14:20:52,343 >>> StorageService.java:321 - Stopping gossip by operator request* >>> *INFO [RMI TCP Connection(10)-127.0.0.1] 2017-10-12 14:20:52,344 >>> Gossiper.java:1532 - Announcing shutdown* >>> *INFO [RMI TCP Connection(10)-127.0.0.1] 2017-10-12 14:20:52,355 >>> StorageService.java:2268 - Node /10.254.169.36 <http://10.254.169.36> state >>> jump to shutdown* >>> *INFO [RMI TCP Connection(12)-127.0.0.1] 2017-10-12 14:20:56,141 >>> Server.java:176 - Stop listening for CQL clients* >>> *INFO [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,472 >>> StorageService.java:1442 - DRAINING: starting drain process* >>> *INFO [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,474 >>> HintsService.java:220 - Paused hints dispatch* >>> *INFO [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,477 >>> Gossiper.java:1532 - Announcing shutdown* >>> *INFO [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,480 >>> StorageService.java:2268 - Node /127.0.0.1 <http://127.0.0.1> state jump to >>> shutdown* >>> *INFO [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:21:01,483 >>> MessagingService.java:984 - Waiting for messaging service to quiesce* >>> *INFO [ACCEPT-/192.168.6.174 <http://192.168.6.174>] 2017-10-12 >>> 14:21:01,485 MessagingService.java:1338 - MessagingService has terminated >>> the accept() thread* >>> *INFO [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:21:02,095 >>> HintsService.java:220 - Paused hints dispatch* >>> *INFO [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:21:02,111 >>> StorageService.java:1442 - DRAINED* >>> >>> Disabling Gossip seemed a good idea, but watching the logs, it may use >>> it to gracefully telling the other nodes he is going down, so I don't know >>> if it's good or bad idea. >>> >>> Disabling Thrift and Binary protocol should only avoid new connections, >>> but the one stablished and running should be attempted to finish. >>> >>> Any thoughts or comments? >>> >>> Thanks >>> >>> Javier. >>> >>> >>> >> > > > -- > *Regards,* > *Anshu * > > >