> Op 7 december 2016 om 11:29 schreef Kees Meijs <k...@nefos.nl>:
> 
> 
> Hi Wido,
> 
> Valid point. At this moment, we're using a cache pool with size = 2 and
> would like to "upgrade" to size = 3.
> 
> Again, you're absolutely right... ;-)
> 
> Anyway, any things to consider or could we just:
> 
>  1. Run "ceph osd pool set cache size 3".
>  2. Wait for rebalancing to complete.
>  3. Run "ceph osd pool set cache min_size 2".
> 

Indeed! It is a simple as that.

Your cache pool can also contain very valuable data you do not want to loose.

Wido

> Thanks!
> 
> Regards,
> Kees
> 
> On 07-12-16 09:08, Wido den Hollander wrote:
> > As a Ceph consultant I get numerous calls throughout the year to help 
> > people with getting their broken Ceph clusters back online.
> >
> > The causes of downtime vary vastly, but one of the biggest causes is that 
> > people use replication 2x. size = 2, min_size = 1.
> >
> > In 2016 the amount of cases I have where data was lost due to these 
> > settings grew exponentially.
> >
> > Usually a disk failed, recovery kicks in and while recovery is happening a 
> > second disk fails. Causing PGs to become incomplete.
> >
> > There have been to many times where I had to use xfs_repair on broken disks 
> > and use ceph-objectstore-tool to export/import PGs.
> >
> > I really don't like these cases, mainly because they can be prevented 
> > easily by using size = 3 and min_size = 2 for all pools.
> >
> > With size = 2 you go into the danger zone as soon as a single disk/daemon 
> > fails. With size = 3 you always have two additional copies left thus 
> > keeping your data safe(r).
> >
> > If you are running CephFS, at least consider running the 'metadata' pool 
> > with size = 3 to keep the MDS happy.
> >
> > Please, let this be a big warning to everybody who is running with size = 
> > 2. The downtime and problems caused by missing objects/replicas are usually 
> > big and it takes days to recover from those. But very often data is lost 
> > and/or corrupted which causes even more problems.
> >
> > I can't stress this enough. Running with size = 2 in production is a 
> > SERIOUS hazard and should not be done imho.
> >
> > To anyone out there running with size = 2, please reconsider this!
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to