I agree with comments above. Cassandra is robust, and we are just talking about optimising the process. Nothing mandatory. Going to an extreme I would say you can pull and plug back the node power cable and call it a restart, It should not harm if your cluster is properly tuned. Yet optimisation are welcomed as they improve entropy, starting time. Plus we are civilized operators, not barbarians, aren't we ;-)? It's just more 'clean' and efficient. Also, historically, it was mandatory to drain when using counter to prevent over-count as counter are not idempotent. Not sure about this nowadays).
Last time I asked this very question I ended up building this command that I have been using since then: `date && nodetool disablebinary && nodetool disablegossip && sleep 10 && nodetool flush && nodetool drain && sleep 10 && sudo service cassandra restart` It does the following: - Print the date for the record - Stop all clients transports. I never heard about a benefice of shutting down the gossip protocol, and so never did so, it might be better but I can't really say. This way we stop listening for clients. - After a small while no clients are using the node, calling the drain flushes memtables and recycle commitlog as Kurt detailed above. Here I add a 'flush' because I haven't been that lucky in the past with drain, sometimes not working at all, sometimes not cleaning commitlogs. I believe flushing first makes this restart command more robust. - Finally restart the service. I think there is not only one good way to do this. Also, doing it wrong is often not such a big deal. C*heers, ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2018-01-08 3:33 GMT+00:00 Jeff Jirsa <jji...@gmail.com>: > The sequence does have some objective benefits - especially stopping > transports and then gossip, it tells everything you’re going offline before > you do, so requests won’t get dropped or have to speculate to other > replicas. > > > > -- > Jeff Jirsa > > > On Jan 7, 2018, at 7:22 PM, kurt greaves <k...@instaclustr.com> wrote: > > None are essential. Cassandra will gracefully shutdown in any scenario as > long as it's not killed with a SIGKILL. However, drain does have a few > benefits over just a normal shutdown. It will stop a few extra services > (batchlog, compactions) and importantly it will also force recycling of > dirty commitlog segments, meaning there will be less commitlog files to > replay on startup and reducing startup time. > > A comment in the code for drain also indicates that it will wait for > in-progress streaming to complete, but I haven't managed to find 1. where > this occurs, or 2. if it actually differs to a normal shutdown. Note that > this is all w.r.t 2.1. In 3.0.10 and 3.10 drain and shutdown more or less > do the exact same thing, however drain will log some extra messages. > > On 2 January 2018 at 07:07, Jing Meng <self.rel...@gmail.com> wrote: > >> Hi all. >> >> Recently we made a change to our production env c* cluster (2.1.18) - >> placing the commit log to the same SSD where data is stored, which needs >> restarting all nodes. >> >> Before restarting a cassandra node, we ran the following nodetool utils: >> $ nodetool disablethrift && sleep 5 >> $ nodetool disablebinary && sleep 5 >> $ nodetool disable gossip && sleep 5 >> $ nodetool drain && sleep 5 >> >> It was "graceful" as expected (no significant errors found), but the >> process is still a myth to us: are those commands used above "sufficient", >> and/or why? The offical doc (docs.datastax.com) did not help with this >> operation detail, though "nodetool drain" is apparently essential. >> > >