Re: JBOD disk failure

Jeff Jirsa Tue, 14 Aug 2018 19:20:34 -0700

Depends on version

For versions without the fix from Cassandra-6696, the only safe option on 
single disk failure is to stop and replace the whole instance - this is 
important because in older versions of Cassandra, you could have data in one 
sstable, a tombstone shadowing it in another disk, and it could be very far 
behind gc_grace_seconds. On disk failure in this scenario, if the disk holding 
the tombstone is lost, repair will propagate the (deleted/resurrected) data to 
the other replicas, which probably isn’t what you want to happen.


With 6696, you should be safe to replace the disk and run repair - 6696 will 
keep data for a given token range all on the same disks, so the resurrection 
problem is solved. 


-- 
Jeff Jirsa


> On Aug 14, 2018, at 6:10 AM, Christian Lorenz <christian.lor...@webtrekk.com> 
> wrote:
> 
> Hi,
>  
> given a cluster with RF=3 and CL=LOCAL_ONE and application is deleting data, 
> what happens if the nodes are setup with JBOD and one disk fails? Do I get 
> consistent results while the broken drive is replaced and a nodetool repair 
> is running on the node with the replaced drive?
>  
> Kind regards,
> Christian

Re: JBOD disk failure

Reply via email to