[ 
https://issues.apache.org/jira/browse/CASSANDRA-20673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-20673:
---------------------------------------
    Description: 
It starts them all in a loop without checking what their PID is from the PID 
file and then loops checking to see if each one is running. If a node kills 
itself then CCM skips straight to stopping all the nodes without having read 
their PID files.

The simplest fix to get started is first is loop over all the nodes waiting for 
them to start up before checking if they have the log message saying they are 
listening for CQL connections. Right now it both waits for the startup (which 
populates the PID field in Node) and then waits for the log message which can 
fail causing the PID field not to get populated.

The entire state management is here is very fragile and really a better way to 
do this would be to tag the command lines and then just `pkill -9 -f` so you 
don't have to search for the PID at all. There is a lot of conditions and 
setting things to `None` and only doing things if in a certain state and when 
it comes to just making sure something is stopped it's not necessary.

Really you could say that the log directory serves as a unique identifier for 
the node. Technically it is not a unique identifier for the process, but it 
would serve as that in most cases if you never started duplicated processes so 
maybe close enough.

  was:
It starts them all in a loop without checking what their PID is from the PID 
file and then loops checking to see if each one is running. If a node kills 
itself then CCM skips straight to stopping all the nodes without having read 
their PID files.

The fix is two parts, first is loop over all the nodes waiting for them to 
start up before checking if they have the log message saying they are listening 
for CQL connections.

The entire state management is here is very fragile and really a better way to 
do this would be to tag the command lines and then just `pkill -9 -f` so you 
don't have to search for the PID at all. There is a lot of conditions and 
setting things to `None` and only doing things if in a certain state and when 
it comes to just making sure something is stopped it's not necessary.


> CCM not killing remaining nodes if lower index nodes fail to start
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-20673
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20673
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: Ariel Weisberg
>            Priority: Normal
>
> It starts them all in a loop without checking what their PID is from the PID 
> file and then loops checking to see if each one is running. If a node kills 
> itself then CCM skips straight to stopping all the nodes without having read 
> their PID files.
> The simplest fix to get started is first is loop over all the nodes waiting 
> for them to start up before checking if they have the log message saying they 
> are listening for CQL connections. Right now it both waits for the startup 
> (which populates the PID field in Node) and then waits for the log message 
> which can fail causing the PID field not to get populated.
> The entire state management is here is very fragile and really a better way 
> to do this would be to tag the command lines and then just `pkill -9 -f` so 
> you don't have to search for the PID at all. There is a lot of conditions and 
> setting things to `None` and only doing things if in a certain state and when 
> it comes to just making sure something is stopped it's not necessary.
> Really you could say that the log directory serves as a unique identifier for 
> the node. Technically it is not a unique identifier for the process, but it 
> would serve as that in most cases if you never started duplicated processes 
> so maybe close enough.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to