Re: JBOD disk failure - just say no

Jonathan Haddad Wed, 22 Aug 2018 10:43:51 -0700

We recently helped a team deal with some JBOD issues, they can be quite
painful, and the experience depends a bit on the C* version in use.  We
wrote a blog post about it (published today):


http://thelastpickle.com/blog/2018/08/22/the-fine-print-when-using-multiple-data-directories.html

Hope this helps.

Jon

On Mon, Aug 20, 2018 at 5:49 PM James Briggs <james.bri...@yahoo.com.invalid>
wrote:

> Cassandra JBOD has a bunch of issues, so I don't recommend it for
> production:
>
> 1) disks fill up with load (data) unevenly, meaning you can run out on a
> disk while some are half-full
> 2) one bad disk can take out the whole node
> 3) instead of a small failure probability on an LVM/RAID volume, with JBOD
> you end up near 100% chance of failure after 3 years or so.
> 4) generally you will not have enough warning of a looming failure with
> JBOD compared to LVM/RAID. (Some
> companies take a week or two to replace a failed disk.)
>
> JBOD is easy to setup, but hard to manage.
>
> Thanks, James.
>
>
>
> ------------------------------
> *From:* kurt greaves <k...@instaclustr.com>
> *To:* User <user@cassandra.apache.org>
> *Sent:* Friday, August 17, 2018 5:42 AM
> *Subject:* Re: JBOD disk failure
>
> As far as I'm aware, yes. I recall hearing someone mention tying system
> tables to a particular disk but at the moment that doesn't exist.
>
> On Fri., 17 Aug. 2018, 01:04 Eric Evans, <john.eric.ev...@gmail.com>
> wrote:
>
> On Wed, Aug 15, 2018 at 3:23 AM kurt greaves <k...@instaclustr.com> wrote:
> > Yep. It might require a full node replace depending on what data is lost
> from the system tables. In some cases you might be able to recover from
> partially lost system info, but it's not a sure thing.
>
> Ugh, does it really just boil down to what part of `system` happens to
> be on the disk in question?  In my mind, that makes the only sane
> operational procedure for a failed disk to be: "replace the entire
> node".  IOW, I don't think we can realistically claim you can survive
> a failed a JBOD device if it relies on happenstance.
>
> > On Wed., 15 Aug. 2018, 17:55 Christian Lorenz, <
> christian.lor...@webtrekk.com > wrote:
> >>
> >> Thank you for the answers. We are using the current version 3.11.3 So
> this one includes CASSANDRA-6696.
> >>
> >> So if I get this right, losing system tables will need a full node
> rebuild. Otherwise repair will get the node consistent again.
> >
> > [ ... ]
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>
> ------------------------------ ------------------------------ ---------
> To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org
> <user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: JBOD disk failure - just say no

Reply via email to