[ 
https://issues.apache.org/jira/browse/CASSANDRA-20673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-20673:
-------------------------------------------
    Reviewers: David Capwell, Michael Semb Wever, Michael Semb Wever  (was: 
David Capwell, Michael Semb Wever)
               David Capwell, Michael Semb Wever, Michael Semb Wever  (was: 
David Capwell, Michael Semb Wever)
       Status: Review In Progress  (was: Patch Available)

> CCM not killing remaining nodes if lower index nodes fail to start
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-20673
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20673
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Tool/CCM
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>            Priority: Normal
>         Attachments: ci_summary_ccm_aweisberg_20673_242.html, 
> results_details_ccm_aweisberg_20673_242.tar.xz
>
>
> It starts them all in a loop without checking what their PID is from the PID 
> file and then loops checking to see if each one is running. If a node kills 
> itself then CCM skips straight to stopping all the nodes without having read 
> their PID files.
> The simplest fix to get started is first is loop over all the nodes waiting 
> for them to start up before checking if they have the log message saying they 
> are listening for CQL connections. Right now it both waits for the startup 
> (which populates the PID field in Node) and then waits for the log message 
> which can fail causing the PID field not to get populated.
> The entire state management is here is very fragile and really a better way 
> to do this would be to tag the command lines and then just `pkill -9 -f` so 
> you don't have to search for the PID at all. There is a lot of conditions and 
> setting things to `None` and only doing things if in a certain state and when 
> it comes to just making sure something is stopped it's not necessary.
> Really you could say that the log directory serves as a unique identifier for 
> the node. Technically it is not a unique identifier for the process, but it 
> would serve as that in most cases if you never started duplicated processes 
> so maybe close enough.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to