>but because there were only two copies it had no way to tell which one was >correct, and when I forced it to choose it often chose wrong.
Yeah. This is a BIG problem with only running with two copies. Good luck if your pgs ever get inconsistent :) --Lincoln > On Oct 26, 2015, at 10:41 AM, Quentin Hartman <qhart...@direwolfdigital.com> > wrote: > > TL;DR - Running two copies in my cluster cost me a weekend, and many more > hours of productive time during normal working hours. Networking problems can > be just as destructive as disk problems. I only run 2 copies on throwaway > data. > > So, I have personal experience in data loss when running only two copies. I > had a networking problem in my ceph cluster, and it took me a long time to > track it down because it was an intermittent thing that caused the node with > the faulty connection to not only get marked out by it's peers, but also > caused it to incorrectly mark out other nodes. It was a mess, that I made > worse by trying to force recovery before I really knew what the problem was > since it was so elusive. > > In the end, the cluster tried to do recovery on PGs that had gotten degraded, > but because there were only two copies it had no way to tell which one was > correct, and when I forced it to choose it often chose wrong. All of the data > was VM images, so in the end, I ended up having small bits of random > corruption across almost all my VMs. It took me about 40 hours of work over a > weekend to get things recovered (onto spare desktop machines since I still > hadn't found the problem and didn't trust the cluster) and rebuilt to make > sure that people could work on monday, and I was cleaning up little bits of > leftover mess for weeks. Once I finally found and repaired the problem, it > was another several days worth of work to get the cluster rebuilt and the VMs > migrated back onto it. Never will I run only two copies on things I actually > care about ever again, regardless of the quality of the underlying disk > hardware. In my case, the disks were fine all along. > > QH > > On Sat, Oct 24, 2015 at 8:35 AM, Christian Balzer <ch...@gol.com> wrote: > > > Hello, > > There have been COUNTLESS discussions about Ceph reliability, fault > tolerance and so forth in this very ML. > Google is very much evil, but in this case it is your friend. > > In those threads you will find several reliability calculators, some more > flawed than others, but penultimately you do not use a replica of 2 for > the same reasons people don't use RAID5 for anything valuable. > > A replication of 2 MAY be fine with very reliable, fast and not too large > SSDs, but that's about it. > Spinning rust is never safe with just one copy. > > Christian > > On Sat, 24 Oct 2015 09:41:35 +0200 Stefan Eriksson wrote: > > > > Am 23.10.2015 um 20:53 schrieb Gregory Farnum: > > >> On Fri, Oct 23, 2015 at 8:17 AM, Stefan Eriksson <ste...@eriksson.cn> > > wrote: > > >> > > >> Nothing changed to make two copies less secure. 3 copies is just so > > >> much more secure and is the number that all the companies providing > > >> support recommend, so we changed the default. > > >> (If you're using it for data you care about, you should really use 3 > > copies!) > > >> -Greg > > > > > > I assume that number really depends on the (number of) OSDs you have in > > your crush rule for that pool. A replication of > > > 2 might be ok for a pool spread over 10 osds, but not for one spread > > > over > > 100 osds.... > > > > > > Corin > > > > > > > I'm also interested in this, what changes when you add 100+ OSDs (to > > warrant 3 replicas instead of 2), and the reasoning as to why "the > > companies providing support recommend 3." ? > > Theoretically it seems secure to have two replicas. > > If you have 100+ OSDs, I can see that maintenance will take much longer, > > and if you use "set noout" then a single PG will be active when the other > > replica is under maintenance. > > But if you "crush reweight to 0" before the maintenance this would not be > > an issue. > > Is this the main reason? > > > > From what I can gather even if you add new OSDs to the cluster and the > > balancing kicks in, it still maintains its two replicas. > > > > thanks. > > > -- > Christian Balzer Network/Systems Engineer > ch...@gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com