Cassandra nodes do not go down "for no reason". They are not stateless. I would like to thank you for this marvelous example of a wonderful antipattern. Absolutely fantastic.
Thank you! I am not being a satirical smartass. I sometimes am challenged by clients in my presentations about sre best practices around c*, hadoop, and elk on the grounds that "noone would ever do this in production". Now I have objective proof! Daemeon sent from my mobile Daemeon C.M. Reiydelle USA 415.501.0198 London +44.0.20.8144.9872 On Feb 23, 2016 7:53 AM, <sean_r_dur...@homedepot.com> wrote: > Yes, I can see the potential problem in theory. However, we never do your > #2. Generally, we don’t have unused spare hardware. We just fix the host > that is down and run repairs. (Side note: while I have seen nodes fight it > out over who owns a particular token in earlier versions, it seems that > 1.2+ doesn’t allow that to happen as easily. The second node will just not > come up.) > > > > For most of our use cases, I would agree with your Coli Conjecture. > > > > > > Sean Durity > > > > *From:* Robert Coli [mailto:rc...@eventbrite.com] > *Sent:* Tuesday, February 09, 2016 4:41 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Restart Cassandra automatically > > > > On Tue, Feb 9, 2016 at 6:20 AM, <sean_r_dur...@homedepot.com> wrote: > > Call me naïve, but we do use an in-house built program for keeping nodes > started (based on a flag-check). The program is something that was written > for all kinds of daemon processes here, not Cassandra specifically. The > basic idea is that is runs a status check. If that fails, and the flag is > set, start Cassandra. In my opinion, it has helped more than hurt us – > especially with the very fragile 1.1 releases that were prone to heap > problems. > > > > Ok, you're naïve.. ;P > > > > But seriously, think of this scenario : > > > > 1) Node A, responsible for range A-M, goes down due to hardware failure of > a disk in a RAID > > 2) Node B is put into service and is made responsible for A-M > > 3) Months pass > > 4) Node A comes back up, announces that it is responsible for A-M, and the > cluster agrees > > > > Consistency is now permanently broken for any involved rows. Why doesn't > it (usually) matter? > > > > It's not so much that you are naïve but that you are providing still more > support for the Coli Conjecture : "If you are using a distributed database > you probably do not care about consistency, even if you think you do." You > have repeatedly chosen Availability over Consistency and it has never had a > negative impact on your actual application. > > > > =Rob > > > > ------------------------------ > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >