Just curious - aside from the "sleep", is this all not part of the shutdown command? Is this an "opportunity" to improve C*? Having worked with RDBMSes, Hadoop and HBase, stopping communication, flushing memcache (HBase), and relinquishing ownership of data (HBase) is all part of the shutdown process.
From: Alain RODRIGUEZ <arodr...@gmail.com> Date: Wednesday, January 10, 2018 at 6:19 AM To: "user cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: Question upon gracefully restarting c* node(s) I agree with comments above. Cassandra is robust, and we are just talking about optimising the process. Nothing mandatory. Going to an extreme I would say you can pull and plug back the node power cable and call it a restart, It should not harm if your cluster is properly tuned. Yet optimisation are welcomed as they improve entropy, starting time. Plus we are civilized operators, not barbarians, aren't we ;-)? It's just more 'clean' and efficient. Also, historically, it was mandatory to drain when using counter to prevent over-count as counter are not idempotent. Not sure about this nowadays). Last time I asked this very question I ended up building this command that I have been using since then: `date && nodetool disablebinary && nodetool disablegossip && sleep 10 && nodetool flush && nodetool drain && sleep 10 && sudo service cassandra restart` It does the following: - Print the date for the record - Stop all clients transports. I never heard about a benefice of shutting down the gossip protocol, and so never did so, it might be better but I can't really say. This way we stop listening for clients. - After a small while no clients are using the node, calling the drain flushes memtables and recycle commitlog as Kurt detailed above. Here I add a 'flush' because I haven't been that lucky in the past with drain, sometimes not working at all, sometimes not cleaning commitlogs. I believe flushing first makes this restart command more robust. - Finally restart the service. I think there is not only one good way to do this. Also, doing it wrong is often not such a big deal. C*heers, ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com<mailto:al...@thelastpickle.com> France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2018-01-08 3:33 GMT+00:00 Jeff Jirsa <jji...@gmail.com<mailto:jji...@gmail.com>>: The sequence does have some objective benefits - especially stopping transports and then gossip, it tells everything you’re going offline before you do, so requests won’t get dropped or have to speculate to other replicas. -- Jeff Jirsa On Jan 7, 2018, at 7:22 PM, kurt greaves <k...@instaclustr.com<mailto:k...@instaclustr.com>> wrote: None are essential. Cassandra will gracefully shutdown in any scenario as long as it's not killed with a SIGKILL. However, drain does have a few benefits over just a normal shutdown. It will stop a few extra services (batchlog, compactions) and importantly it will also force recycling of dirty commitlog segments, meaning there will be less commitlog files to replay on startup and reducing startup time. A comment in the code for drain also indicates that it will wait for in-progress streaming to complete, but I haven't managed to find 1. where this occurs, or 2. if it actually differs to a normal shutdown. Note that this is all w.r.t 2.1. In 3.0.10 and 3.10 drain and shutdown more or less do the exact same thing, however drain will log some extra messages. On 2 January 2018 at 07:07, Jing Meng <self.rel...@gmail.com<mailto:self.rel...@gmail.com>> wrote: Hi all. Recently we made a change to our production env c* cluster (2.1.18) - placing the commit log to the same SSD where data is stored, which needs restarting all nodes. Before restarting a cassandra node, we ran the following nodetool utils: $ nodetool disablethrift && sleep 5 $ nodetool disablebinary && sleep 5 $ nodetool disable gossip && sleep 5 $ nodetool drain && sleep 5 It was "graceful" as expected (no significant errors found), but the process is still a myth to us: are those commands used above "sufficient", and/or why? The offical doc (docs.datastax.com<http://docs.datastax.com>) did not help with this operation detail, though "nodetool drain" is apparently essential.