Re: [ceph-users] replica questions

Matteo Dacrema Tue, 07 Mar 2017 13:26:57 -0800

Hi,

thank you all.


I’m using Mellanox switches with connectX-3 40 gbit pro NIC.
Bond balance-xor with policy layer3+4 

It’s a bit expensive but it’s very hard to saturate.
I’m using one single nic for both replica and access network. 


> Il giorno 03 mar 2017, alle ore 14:52, Vy Nguyen Tan 
> <vynt.kensh...@gmail.com> ha scritto:
> 
> Hi,
> 
> You should read email from Wido den Hollander:
> "Hi,
> 
> As a Ceph consultant I get numerous calls throughout the year to help people 
> with getting their broken Ceph clusters back online.
> 
> The causes of downtime vary vastly, but one of the biggest causes is that 
> people use replication 2x. size = 2, min_size = 1.
> 
> In 2016 the amount of cases I have where data was lost due to these settings 
> grew exponentially.
> 
> Usually a disk failed, recovery kicks in and while recovery is happening a 
> second disk fails. Causing PGs to become incomplete.
> 
> There have been to many times where I had to use xfs_repair on broken disks 
> and use ceph-objectstore-tool to export/import PGs.
> 
> I really don't like these cases, mainly because they can be prevented easily 
> by using size = 3 and min_size = 2 for all pools.
> 
> With size = 2 you go into the danger zone as soon as a single disk/daemon 
> fails. With size = 3 you always have two additional copies left thus keeping 
> your data safe(r).
> 
> If you are running CephFS, at least consider running the 'metadata' pool with 
> size = 3 to keep the MDS happy.
> 
> Please, let this be a big warning to everybody who is running with size = 2. 
> The downtime and problems caused by missing objects/replicas are usually big 
> and it takes days to recover from those. But very often data is lost and/or 
> corrupted which causes even more problems.
> 
> I can't stress this enough. Running with size = 2 in production is a SERIOUS 
> hazard and should not be done imho.
> 
> To anyone out there running with size = 2, please reconsider this!
> 
> Thanks,
> 
> Wido"
> 
> Btw, could you please share your experience about HA network for Ceph ? What 
> type of bonding do you have? are you using stackable switches?
> 
> 
> 
> On Fri, Mar 3, 2017 at 6:24 PM, Maxime Guyot <maxime.gu...@elits.com 
> <mailto:maxime.gu...@elits.com>> wrote:
> Hi Henrik and Matteo,
> 
>  
> 
> While I agree with Henrik: increasing your replication factor won’t improve 
> recovery or read performance on its own. If you are changing from replica 2 
> to replica 3, you might need to scale-out your cluster to have enough space 
> for the additional replica, and that would improve the recovery and read 
> performance.
> 
>  
> 
> Cheers,
> 
> Maxime
> 
>  
> 
> From: ceph-users <ceph-users-boun...@lists.ceph.com 
> <mailto:ceph-users-boun...@lists.ceph.com>> on behalf of Henrik Korkuc 
> <li...@kirneh.eu <mailto:li...@kirneh.eu>>
> Date: Friday 3 March 2017 11:35
> To: "ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>" 
> <ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>>
> Subject: Re: [ceph-users] replica questions
> 
>  
> 
> On 17-03-03 12:30, Matteo Dacrema wrote:
> 
> Hi All,
> 
>  
> 
> I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD every 5 
> OSDs with replica 2 for a total RAW space of 150 TB.
> 
> I’ve few question about it:
> 
>  
> 
> It’s critical to have replica 2? Why?
> Replica size 3 is highly recommended. I do not know exact numbers but it 
> decreases chance of data loss as 2 disk failures appear to be quite frequent 
> thing, especially in larger clusters.
> 
> 
> Does replica 3 makes recovery faster?
> no
> 
> 
> Does replica 3 makes rebalancing and recovery less heavy for customers? If I 
> lose 1 node does replica 3 reduce the IO impact respect a replica 2?
> no
> 
> 
> Does read performance increase with replica 3?
> no
> 
> 
>  
> 
> Thank you
> 
> Regards
> 
> Matteo
> 
>  
> 
> --------------------------------------------
> 
> This email and any files transmitted with it are confidential and intended 
> solely for the use of the individual or entity to whom they are addressed. If 
> you have received this email in error please notify the system manager. This 
> message contains confidential information and is intended only for the 
> individual named. If you are not the named addressee you should not 
> disseminate, distribute or copy this e-mail. Please notify the sender 
> immediately by e-mail if you have received this e-mail by mistake and delete 
> this e-mail from your system. If you are not the intended recipient you are 
> notified that disclosing, copying, distributing or taking any action in 
> reliance on the contents of this information is strictly prohibited.
> 
>  
> 
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>  
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 
> 
> -- 
> Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non 
> infetto. 
> Clicca qui per segnalarlo come spam. 
> <http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=311BD40277.A223C> 
> Clicca qui per metterlo in blacklist 
> <http://mx01.enter.it/cgi-bin/learn-msg.cgi?blacklist=1&id=311BD40277.A223C> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] replica questions

Reply via email to