Hi, thank you all.
I’m using Mellanox switches with connectX-3 40 gbit pro NIC. Bond balance-xor with policy layer3+4 It’s a bit expensive but it’s very hard to saturate. I’m using one single nic for both replica and access network. > Il giorno 03 mar 2017, alle ore 14:52, Vy Nguyen Tan > <vynt.kensh...@gmail.com> ha scritto: > > Hi, > > You should read email from Wido den Hollander: > "Hi, > > As a Ceph consultant I get numerous calls throughout the year to help people > with getting their broken Ceph clusters back online. > > The causes of downtime vary vastly, but one of the biggest causes is that > people use replication 2x. size = 2, min_size = 1. > > In 2016 the amount of cases I have where data was lost due to these settings > grew exponentially. > > Usually a disk failed, recovery kicks in and while recovery is happening a > second disk fails. Causing PGs to become incomplete. > > There have been to many times where I had to use xfs_repair on broken disks > and use ceph-objectstore-tool to export/import PGs. > > I really don't like these cases, mainly because they can be prevented easily > by using size = 3 and min_size = 2 for all pools. > > With size = 2 you go into the danger zone as soon as a single disk/daemon > fails. With size = 3 you always have two additional copies left thus keeping > your data safe(r). > > If you are running CephFS, at least consider running the 'metadata' pool with > size = 3 to keep the MDS happy. > > Please, let this be a big warning to everybody who is running with size = 2. > The downtime and problems caused by missing objects/replicas are usually big > and it takes days to recover from those. But very often data is lost and/or > corrupted which causes even more problems. > > I can't stress this enough. Running with size = 2 in production is a SERIOUS > hazard and should not be done imho. > > To anyone out there running with size = 2, please reconsider this! > > Thanks, > > Wido" > > Btw, could you please share your experience about HA network for Ceph ? What > type of bonding do you have? are you using stackable switches? > > > > On Fri, Mar 3, 2017 at 6:24 PM, Maxime Guyot <maxime.gu...@elits.com > <mailto:maxime.gu...@elits.com>> wrote: > Hi Henrik and Matteo, > > > > While I agree with Henrik: increasing your replication factor won’t improve > recovery or read performance on its own. If you are changing from replica 2 > to replica 3, you might need to scale-out your cluster to have enough space > for the additional replica, and that would improve the recovery and read > performance. > > > > Cheers, > > Maxime > > > > From: ceph-users <ceph-users-boun...@lists.ceph.com > <mailto:ceph-users-boun...@lists.ceph.com>> on behalf of Henrik Korkuc > <li...@kirneh.eu <mailto:li...@kirneh.eu>> > Date: Friday 3 March 2017 11:35 > To: "ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>" > <ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>> > Subject: Re: [ceph-users] replica questions > > > > On 17-03-03 12:30, Matteo Dacrema wrote: > > Hi All, > > > > I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD every 5 > OSDs with replica 2 for a total RAW space of 150 TB. > > I’ve few question about it: > > > > It’s critical to have replica 2? Why? > Replica size 3 is highly recommended. I do not know exact numbers but it > decreases chance of data loss as 2 disk failures appear to be quite frequent > thing, especially in larger clusters. > > > Does replica 3 makes recovery faster? > no > > > Does replica 3 makes rebalancing and recovery less heavy for customers? If I > lose 1 node does replica 3 reduce the IO impact respect a replica 2? > no > > > Does read performance increase with replica 3? > no > > > > > Thank you > > Regards > > Matteo > > > > -------------------------------------------- > > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they are addressed. If > you have received this email in error please notify the system manager. This > message contains confidential information and is intended only for the > individual named. If you are not the named addressee you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately by e-mail if you have received this e-mail by mistake and delete > this e-mail from your system. If you are not the intended recipient you are > notified that disclosing, copying, distributing or taking any action in > reliance on the contents of this information is strictly prohibited. > > > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > -- > Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non > infetto. > Clicca qui per segnalarlo come spam. > <http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=311BD40277.A223C> > Clicca qui per metterlo in blacklist > <http://mx01.enter.it/cgi-bin/learn-msg.cgi?blacklist=1&id=311BD40277.A223C> > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com