On 11/10/2017 7:17 AM, Sébastien VIGNERON wrote: > Hi everyone, > > Beginner with Ceph, i’m looking for a way to do a 3-way replication > between 2 datacenters as mention in ceph docs (but not describe). > > My goal is to keep access to the data (at least read-only access) even > when the link between the 2 datacenters is cut and make sure at least > one copy of the data exists in each datacenter. If that is your goal, then why 3 way replication?
> > I’m not sure how to implement such 3-way replication. With a rule? > Based on the CEPH docs, I think of a rule: > rule 3-way-replication_with_2_DC { > ruleset 1 > type replicated > min_size 2 > max_size 3 min_size and max_size here doesn't do what you expect it to. You need to set min_size = 1 for a 2 way replicated cluster (beware of inconcistencies if the link between DC's go down) and = 2 for a 3 way replicated cluster, but the setting is on the pool, not in the crush rule. > step take DC-1 > step choose firstn 1 type host > step chooseleaf firstn 1 type osd > step emit > step take DC-2 > step choose firstn 1 type host > step chooseleaf firstn 1 type osd > step emit > step take default > step choose firstn 1 type host > step chooseleaf firstn 1 type osd > step emit > } > but what should happen if the link between the 2 datacenters is cut? > If someone has a better solution, I interested by any resources about > it (examples, …). This seems to, for each pg, take an osd on a host in DC-1 and then and osd on a host in DC-2, and then just a random osd on a random host anywhere. 50% of the extra osds selected will be in DC1 and the rest in DC2. When the link is cut, 50% of the pgs will not be able to fulfil the min_size = 2 requirement (depending on if the observer is in DC1 or DC2 of course) and operations will stop on these. This should in practice stop all operations, and Im not even considering monitor quorum yet. > > The default rule (see below) keep the pool working when we mark each > node of DC-2 as down (typically maintenance) but if we shut the link > down between the 2 datacenters, the pool/rbd hangs (frozen writing dd > tool for example). > Does anyone have some insight on how to setup a 3-way replication > between 2 datacenters? I don't really know why there is a difference here. We opted for a 3 way cluster in 3 separate datacenters though. Perhaps somehow you can simulate 2 separate datacenters in one of yours, at least make sure they are on different power circuits etc. Also, consider redundancy for your network so that is does not go down. Spanning tree is a little slow, but TRILL or SPB should work in your case. > > Thanks in advance for any advice on the topic. > > Current situation: > > Mons : host-1, host-2, host-4 > > Quick network topology: > > USERS NETWORK > | > 2x10G > | > DC-1-SWITCH <——— 40G ——> DC-2-SWITCH > | | | | | | > host-1 _| | | host-4 _| | | > host-2 ___| | host-5 ___| | > host-3 _____| host-6 _____| > > > > crushmap : > # ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 147.33325 root default > -20 73.66663 datacenter DC-1 > -15 73.66663 rack DC-1-RACK-1 > -9 24.55554 host host-1 > 27 hdd 2.72839 osd.27 up 1.00000 1.00000 > 28 hdd 2.72839 osd.28 up 1.00000 1.00000 > 29 hdd 2.72839 osd.29 up 1.00000 1.00000 > 30 hdd 2.72839 osd.30 up 1.00000 1.00000 > 31 hdd 2.72839 osd.31 up 1.00000 1.00000 > 32 hdd 2.72839 osd.32 up 1.00000 1.00000 > 33 hdd 2.72839 osd.33 up 1.00000 1.00000 > 34 hdd 2.72839 osd.34 up 1.00000 1.00000 > 36 hdd 2.72839 osd.36 up 1.00000 1.00000 > -11 24.55554 host host-2 > 35 hdd 2.72839 osd.35 up 1.00000 1.00000 > 37 hdd 2.72839 osd.37 up 1.00000 1.00000 > 38 hdd 2.72839 osd.38 up 1.00000 1.00000 > 39 hdd 2.72839 osd.39 up 1.00000 1.00000 > 40 hdd 2.72839 osd.40 up 1.00000 1.00000 > 41 hdd 2.72839 osd.41 up 1.00000 1.00000 > 42 hdd 2.72839 osd.42 up 1.00000 1.00000 > 43 hdd 2.72839 osd.43 up 1.00000 1.00000 > 46 hdd 2.72839 osd.46 up 1.00000 1.00000 > -13 24.55554 host host-3 > 44 hdd 2.72839 osd.44 up 1.00000 1.00000 > 45 hdd 2.72839 osd.45 up 1.00000 1.00000 > 47 hdd 2.72839 osd.47 up 1.00000 1.00000 > 48 hdd 2.72839 osd.48 up 1.00000 1.00000 > 49 hdd 2.72839 osd.49 up 1.00000 1.00000 > 50 hdd 2.72839 osd.50 up 1.00000 1.00000 > 51 hdd 2.72839 osd.51 up 1.00000 1.00000 > 52 hdd 2.72839 osd.52 up 1.00000 1.00000 > 53 hdd 2.72839 osd.53 up 1.00000 1.00000 > -19 73.66663 datacenter DC-2 > -16 73.66663 rack DC-2-RACK-1 > -3 24.55554 host host-4 > 0 hdd 2.72839 osd.0 up 1.00000 1.00000 > 1 hdd 2.72839 osd.1 up 1.00000 1.00000 > 2 hdd 2.72839 osd.2 up 1.00000 1.00000 > 3 hdd 2.72839 osd.3 up 1.00000 1.00000 > 4 hdd 2.72839 osd.4 up 1.00000 1.00000 > 5 hdd 2.72839 osd.5 up 1.00000 1.00000 > 6 hdd 2.72839 osd.6 up 1.00000 1.00000 > 7 hdd 2.72839 osd.7 up 1.00000 1.00000 > 8 hdd 2.72839 osd.8 up 1.00000 1.00000 > -5 24.55554 host host-5 > 9 hdd 2.72839 osd.9 up 1.00000 1.00000 > 10 hdd 2.72839 osd.10 up 1.00000 1.00000 > 11 hdd 2.72839 osd.11 up 1.00000 1.00000 > 12 hdd 2.72839 osd.12 up 1.00000 1.00000 > 13 hdd 2.72839 osd.13 up 1.00000 1.00000 > 14 hdd 2.72839 osd.14 up 1.00000 1.00000 > 15 hdd 2.72839 osd.15 up 1.00000 1.00000 > 16 hdd 2.72839 osd.16 up 1.00000 1.00000 > 18 hdd 2.72839 osd.18 up 1.00000 1.00000 > -7 24.55554 host host-6 > 19 hdd 2.72839 osd.19 up 1.00000 1.00000 > 20 hdd 2.72839 osd.20 up 1.00000 1.00000 > 21 hdd 2.72839 osd.21 up 1.00000 1.00000 > 22 hdd 2.72839 osd.22 up 1.00000 1.00000 > 23 hdd 2.72839 osd.23 up 1.00000 1.00000 > 24 hdd 2.72839 osd.24 up 1.00000 1.00000 > 25 hdd 2.72839 osd.25 up 1.00000 1.00000 > 26 hdd 2.72839 osd.26 up 1.00000 1.00000 > 54 hdd 2.72839 osd.54 up 1.00000 1.00000 > > current rules (default one) : > # ceph osd crush rule dump > [ > { > "rule_id": 0, > "rule_name": "replicated_rule", > "ruleset": 0, > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { > "op": "take", > "item": -1, > "item_name": "default" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > } > ] > > Cordialement / Best regards, > > Sébastien VIGNERON > CRIANN, > Ingénieur / Engineer > Technopôle du Madrillet > 745, avenue de l'Université > 76800 Saint-Etienne du Rouvray - France > tél. +33 2 32 91 42 91 > fax. +33 2 32 91 42 92 > http://www.criann.fr > mailto:sebastien.vigne...@criann.fr > support: supp...@criann.fr > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com