Hi, Let me add a little math to your warning: with LSE rate of 1 in 10^15 on modern 8 TB disks there is 5,8% chance to hit LSE during recovery of 8 TB disk. So, every 18th recovery will probably fail. Similarly to RAID6 (two parity disks) size=3 mitigates the problem. By the way - why it is a common opinion that using RAID (RAID6) with Ceph (size=2) is bad idea? It is cheaper than size=3, all hardware disk errors are handled by RAID (instead of OS/Ceph), decreases OSD count, adds some battery-backed cache and increases performance of single OSD.
> 7 дек. 2016 г., в 11:08, Wido den Hollander <w...@42on.com> написал(а): > > Hi, > > As a Ceph consultant I get numerous calls throughout the year to help people > with getting their broken Ceph clusters back online. > > The causes of downtime vary vastly, but one of the biggest causes is that > people use replication 2x. size = 2, min_size = 1. > > In 2016 the amount of cases I have where data was lost due to these settings > grew exponentially. > > Usually a disk failed, recovery kicks in and while recovery is happening a > second disk fails. Causing PGs to become incomplete. > > There have been to many times where I had to use xfs_repair on broken disks > and use ceph-objectstore-tool to export/import PGs. > > I really don't like these cases, mainly because they can be prevented easily > by using size = 3 and min_size = 2 for all pools. > > With size = 2 you go into the danger zone as soon as a single disk/daemon > fails. With size = 3 you always have two additional copies left thus keeping > your data safe(r). > > If you are running CephFS, at least consider running the 'metadata' pool with > size = 3 to keep the MDS happy. > > Please, let this be a big warning to everybody who is running with size = 2. > The downtime and problems caused by missing objects/replicas are usually big > and it takes days to recover from those. But very often data is lost and/or > corrupted which causes even more problems. > > I can't stress this enough. Running with size = 2 in production is a SERIOUS > hazard and should not be done imho. > > To anyone out there running with size = 2, please reconsider this! > > Thanks, > > Wido > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Dmitry Glushenok Jet Infosystems
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com