Hi Kenneth, Thanks for your interest to help. I had to take a decision quick because it was a production cluster. So, long story short, I let the cluster finish the decommission process before touching it. When decommissioned node left the cluster I did a rolling restart and the nodes start behaving again without errors, also auto-compaction resumed and all nodes had accumulated a lot of files to compact. Then I performed a rolling upgrade from 3.11.1 to 3.11.4 which went very smoothly.
In retrospect to answer your questions: > Was the cluster running ok before decommissioning the node? Yes > Why were you decommissioning the node? Management decision, we wanted just to shrink the cluster. > Were you upgrading from 3.11.1 to 3.11.4? No, that was not the initial intention. I arrived at that conclusion after I realized I stepped into this bug on the rest of the nodes. "Prevent compaction strategies from looping indefinitely" CASSANDRA-14079 <https://issues.apache.org/jira/browse/CASSANDRA-14079> Thanks again! On Thu, Feb 28, 2019 at 10:45 AM Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote: > Hi John, > > > > Was the cluster running ok before decommissioning the node? > > Why were you decommissioning the node? > > Were you upgrading from 3.11.1 to 3.11.4? > > > > > > *From:* Ioannis Zafiropoulos [mailto:john...@gmail.com] > *Sent:* Wednesday, February 27, 2019 7:33 AM > *To:* user@cassandra.apache.org > *Subject:* Upgrade 3.11.1 to 3.11.4 > > > > Hi all, > > > > During a decommission on a production cluster (9 nodes) we have some > issues on the remaining nodes regarding compaction, and I have some > questions about that: > > > > One remaining node who has stopped compacting, due to some bug > <https://issues.apache.org/jira/browse/CASSANDRA-14079> in 3.11.1, *has > received all* the streaming files from the decommission node > (decommissioning is still in progress for the rest of the cluster). Could I > upgrade this node to 3.11.4 and restart it? > > > > Some other nodes which *are still receiving* files appear to do very > little to no auto-compaction from nodetool tpstats. Should I wait for > streaming to complete or should I upgrade these nodes as well and restart > them? What would happen if I bounce such a node? will the whole process of > decommissioning fail? > > > > Do you recommend to eventually do a rolling upgrade to 3.11.4 or choose > another version? > > > > Thanks in advance for your help, > > John Zaf >