Re: [ceph-users] why was osd pool default size changed from 2 to 3.

Lincoln Bryant Mon, 26 Oct 2015 08:49:16 -0700

>but because there were only two copies it had no way to tell which one was 
>correct, and when I forced it to choose it often chose wrong.


Yeah. This is a BIG problem with only running with two copies. Good luck if 
your pgs ever get inconsistent :)

--Lincoln

> On Oct 26, 2015, at 10:41 AM, Quentin Hartman <qhart...@direwolfdigital.com> 
> wrote:
> 
> TL;DR - Running two copies in my cluster cost me a weekend, and many more 
> hours of productive time during normal working hours. Networking problems can 
> be just as destructive as disk problems. I only run 2 copies on throwaway 
> data.
> 
> So, I have personal experience in data loss when running only two copies. I 
> had a networking problem in my ceph cluster, and it took me a long time to 
> track it down because it was an intermittent thing that caused the node with 
> the faulty connection to not only get marked out by it's peers, but also 
> caused it to incorrectly mark out other nodes. It was a mess, that I made 
> worse by trying to force recovery before I really knew what the problem was 
> since it was so elusive.
> 
> In the end, the cluster tried to do recovery on PGs that had gotten degraded, 
> but because there were only two copies it had no way to tell which one was 
> correct, and when I forced it to choose it often chose wrong. All of the data 
> was VM images, so in the end, I ended up having small bits of random 
> corruption across almost all my VMs. It took me about 40 hours of work over a 
> weekend to get things recovered (onto spare desktop machines since I still 
> hadn't found the problem and didn't trust the cluster) and rebuilt to make 
> sure that people could work on monday, and I was cleaning up little bits of 
> leftover mess for weeks. Once I finally found and repaired the problem, it 
> was another several days worth of work to get the cluster rebuilt and the VMs 
> migrated back onto it. Never will I run only two copies on things I actually 
> care about ever again, regardless of the quality of the underlying disk 
> hardware. In my case, the disks were fine all along.
> 
> QH
> 
> On Sat, Oct 24, 2015 at 8:35 AM, Christian Balzer <ch...@gol.com> wrote:
> 
> 
> Hello,
> 
> There have been COUNTLESS discussions about Ceph reliability, fault
> tolerance and so forth in this very ML.
> Google is very much evil, but in this case it is your friend.
> 
> In those threads you will find several reliability calculators, some more
> flawed than others, but penultimately you do not use a replica of 2 for
> the same reasons people don't use RAID5 for anything valuable.
> 
> A replication of 2 MAY be fine with very reliable, fast and not too large
> SSDs, but that's about it.
> Spinning rust is never safe with just one copy.
> 
> Christian
> 
> On Sat, 24 Oct 2015 09:41:35 +0200 Stefan Eriksson wrote:
> 
> > > Am 23.10.2015 um 20:53 schrieb Gregory Farnum:
> > >> On Fri, Oct 23, 2015 at 8:17 AM, Stefan Eriksson <ste...@eriksson.cn>
> > wrote:
> > >>
> > >> Nothing changed to make two copies less secure. 3 copies is just so
> > >> much more secure and is the number that all the companies providing
> > >> support recommend, so we changed the default.
> > >> (If you're using it for data you care about, you should really use 3
> > copies!)
> > >> -Greg
> > >
> > > I assume that number really depends on the (number of) OSDs you have in
> > your crush rule for that pool. A replication of
> > > 2 might be ok for a pool spread over 10 osds, but not for one spread
> > > over
> > 100 osds....
> > >
> > > Corin
> > >
> >
> > I'm also interested in this, what changes when you add 100+ OSDs (to
> > warrant 3 replicas instead of 2), and the reasoning as to why "the
> > companies providing support recommend 3." ?
> > Theoretically it seems secure to have two replicas.
> > If you have 100+ OSDs, I can see that maintenance will take much longer,
> > and if you use "set noout" then a single PG will be active when the other
> > replica is under maintenance.
> > But if you "crush reweight to 0" before the maintenance this would not be
> > an issue.
> > Is this the main reason?
> >
> > From what I can gather even if you add new OSDs to the cluster and the
> > balancing kicks in, it still maintains its two replicas.
> >
> > thanks.
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> ch...@gol.com           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] why was osd pool default size changed from 2 to 3.

Reply via email to