[ https://issues.apache.org/jira/browse/CASSANDRA-20673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Semb Wever updated CASSANDRA-20673: ------------------------------------------- Reviewers: David Capwell, Michael Semb Wever, Michael Semb Wever (was: David Capwell, Michael Semb Wever) David Capwell, Michael Semb Wever, Michael Semb Wever (was: David Capwell, Michael Semb Wever) Status: Review In Progress (was: Patch Available) > CCM not killing remaining nodes if lower index nodes fail to start > ------------------------------------------------------------------ > > Key: CASSANDRA-20673 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20673 > Project: Apache Cassandra > Issue Type: Bug > Components: Tool/CCM > Reporter: Ariel Weisberg > Assignee: Ariel Weisberg > Priority: Normal > Attachments: ci_summary_ccm_aweisberg_20673_242.html, > results_details_ccm_aweisberg_20673_242.tar.xz > > > It starts them all in a loop without checking what their PID is from the PID > file and then loops checking to see if each one is running. If a node kills > itself then CCM skips straight to stopping all the nodes without having read > their PID files. > The simplest fix to get started is first is loop over all the nodes waiting > for them to start up before checking if they have the log message saying they > are listening for CQL connections. Right now it both waits for the startup > (which populates the PID field in Node) and then waits for the log message > which can fail causing the PID field not to get populated. > The entire state management is here is very fragile and really a better way > to do this would be to tag the command lines and then just `pkill -9 -f` so > you don't have to search for the PID at all. There is a lot of conditions and > setting things to `None` and only doing things if in a certain state and when > it comes to just making sure something is stopped it's not necessary. > Really you could say that the log directory serves as a unique identifier for > the node. Technically it is not a unique identifier for the process, but it > would serve as that in most cases if you never started duplicated processes > so maybe close enough. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org