RE: Restart Cassandra automatically

daemeon reiydelle Tue, 23 Feb 2016 08:21:29 -0800

Cassandra nodes do not go down "for no reason". They are not stateless. I
would like to thank you for this marvelous example of a wonderful
antipattern. Absolutely fantastic.


Thank you! I am not being a satirical smartass. I sometimes am challenged
by clients in my presentations about sre best practices around c*, hadoop,
and elk on the grounds that "noone would ever do this in production". Now I
have objective proof!

Daemeon

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Feb 23, 2016 7:53 AM, <sean_r_dur...@homedepot.com> wrote:

> Yes, I can see the potential problem in theory. However, we never do your
> #2. Generally, we don’t have unused spare hardware. We just fix the host
> that is down and run repairs. (Side note: while I have seen nodes fight it
> out over who owns a particular token in earlier versions, it seems that
> 1.2+ doesn’t allow that to happen as easily. The second node will just not
> come up.)
>
>
>
> For most of our use cases, I would agree with your Coli Conjecture.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* Tuesday, February 09, 2016 4:41 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Restart Cassandra automatically
>
>
>
> On Tue, Feb 9, 2016 at 6:20 AM, <sean_r_dur...@homedepot.com> wrote:
>
> Call me naïve, but we do use an in-house built program for keeping nodes
> started (based on a flag-check). The program is something that was written
> for all kinds of daemon processes here, not Cassandra specifically. The
> basic idea is that is runs a status check. If that fails, and the flag is
> set, start Cassandra. In my opinion, it has helped more than hurt us –
> especially with the very fragile 1.1 releases that were prone to heap
> problems.
>
>
>
> Ok, you're naïve.. ;P
>
>
>
> But seriously, think of this scenario :
>
>
>
> 1) Node A, responsible for range A-M, goes down due to hardware failure of
> a disk in a RAID
>
> 2) Node B is put into service and is made responsible for A-M
>
> 3) Months pass
>
> 4) Node A comes back up, announces that it is responsible for A-M, and the
> cluster agrees
>
>
>
> Consistency is now permanently broken for any involved rows. Why doesn't
> it (usually) matter?
>
>
>
> It's not so much that you are naïve but that you are providing still more
> support for the Coli Conjecture : "If you are using a distributed database
> you probably do not care about consistency, even if you think you do." You
> have repeatedly chosen Availability over Consistency and it has never had a
> negative impact on your actual application.
>
>
>
> =Rob
>
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: Restart Cassandra automatically

Reply via email to