On 1/16/19 8:08 PM, Anthony Verevkin wrote: > I would definitely see huge value in going to 3 MONs here (and btw 2 on-site > MGR and 2 on-site MDS) > However 350Kbps is quite low and MONs may be latency sensitive, so I suggest > you do heavy QoS if you want to use that link for ANYTHING else. > If you do so, make sure your clients are only listing the on-site MONs so > they don't try to read from the off-site MON.
Won't work. The clients will receive the Monmap and they might choose to fail over to the MON on the 3rd location. > Still you risk the stability of the cluster if the off-site MON starts > lagging. If it's still considered on while lagging, all changes to cluster > (osd going up/down, etc) would be blocked by waiting it to commit. > > Even if you choose against an off-site MON, maybe consider 2 on-site MON > instead. Yes, you'd double the risk of cluster going to a halt if any one > node dies vs one specific node dying. But if that happens you have a manual > way of downgrading to a single MON (and you still have your MON's data) vs > risking to get stuck with a OSD-only node that had never had MON installed > and not having a copy of MON DB. > > I also see how you want to get the data out for backups. > Having a third replica off-site definitely won't fly with such bandwidth as > it would once again block the IO until committed by the off-site OSD. > I am not quite sure RBD mirroring would play nicely with this kind of link > either. Maybe stick with application-level off-site backups. > And again, whatever replication/backup strategy you do, need to QoS or else > you'd cripple your connection which I assume is used for some other > communications as well. > > ... or totally unrelated to Ceph, maybe just backup to the local USB drive > and have somebody replace it and ship to the head office once a while? > > Regards, > Anthony > > ----- Original Message ----- > From: "Brian Topping" <brian.topp...@gmail.com> > To: ceph-users@lists.ceph.com > Sent: Saturday, January 12, 2019 1:07:05 AM > Subject: [ceph-users] Offsite replication scenario > > Hi all, > > I have a simple two-node Ceph cluster that I’m comfortable with the care and > feeding of. Both nodes are in a single rack and captured in the attached > dump, it has two nodes, only one mon, all pools size 2. Due to physical > limitations, the primary location can’t move past two nodes at the present > time. As far as hardware, those two nodes are 18-core Xeon with 128GB RAM and > connected with 10GbE. > > My next goal is to add an offsite replica and would like to validate the plan > I have in mind. For it’s part, the offsite replica can be considered > read-only except for the occasional snapshot in order to run backups to tape. > The offsite location is connected with a reliable and secured ~350Kbps WAN > link. > > The following presuppositions bear challenge: > > * There is only a single mon at the present time, which could be expanded to > three with the offsite location. Two mons at the primary location is > obviously a lower MTBF than one, but with a third one on the other side of > the WAN, I could create resiliency against *either* a WAN failure or a single > node maintenance event. > * Because there are two mons at the primary location and one at the offsite, > the degradation mode for a WAN loss (most likely scenario due to facility > support) leaves the primary nodes maintaining the quorum, which is desirable. > * It’s clear that a WAN failure and a mon failure at the primary location > will halt cluster access. > * The CRUSH maps will be managed to reflect the topology change. > > If that’s a good capture so far, I’m comfortable with it. What I don’t > understand is what to expect in actual use: > > * Is the link speed asymmetry between the two primary nodes and the offsite > node going to create significant risk or unexpected behaviors? > * Will the performance of the two primary nodes be limited to the speed that > the offsite mon can participate? Or will the primary mons correctly calculate > they have quorum and keep moving forward under normal operation? > * In the case of an extended WAN outage (and presuming full uptime on primary > site mons), would return to full cluster health be simply a matter of time? > Are there any limits on how long the WAN could be down if the other two > maintain quorum? > > I hope I’m asking the right questions here. Any feedback appreciated, > including blogs and RTFM pointers. > > > Thanks for a great product!! I’m really excited for this next frontier! > > Brian > >> [root@gw01 ~]# ceph -s >> cluster: >> id: nnnn >> health: HEALTH_OK >> >> services: >> mon: 1 daemons, quorum gw01 >> mgr: gw01(active) >> mds: cephfs-1/1/1 up {0=gw01=up:active} >> osd: 8 osds: 8 up, 8 in >> >> data: >> pools: 3 pools, 380 pgs >> objects: 172.9 k objects, 11 GiB >> usage: 30 GiB used, 5.8 TiB / 5.8 TiB avail >> pgs: 380 active+clean >> >> io: >> client: 612 KiB/s wr, 0 op/s rd, 50 op/s wr >> >> [root@gw01 ~]# ceph df >> GLOBAL: >> SIZE AVAIL RAW USED %RAW USED >> 5.8 TiB 5.8 TiB 30 GiB 0.51 >> POOLS: >> NAME ID USED %USED MAX AVAIL OBJECTS >> cephfs_metadata 2 264 MiB 0 2.7 TiB 1085 >> cephfs_data 3 8.3 GiB 0.29 2.7 TiB 171283 >> rbd 4 2.0 GiB 0.07 2.7 TiB 542 >> [root@gw01 ~]# ceph osd tree >> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >> -1 5.82153 root default >> -3 2.91077 host gw01 >> 0 ssd 0.72769 osd.0 up 1.00000 1.00000 >> 2 ssd 0.72769 osd.2 up 1.00000 1.00000 >> 4 ssd 0.72769 osd.4 up 1.00000 1.00000 >> 6 ssd 0.72769 osd.6 up 1.00000 1.00000 >> -5 2.91077 host gw02 >> 1 ssd 0.72769 osd.1 up 1.00000 1.00000 >> 3 ssd 0.72769 osd.3 up 1.00000 1.00000 >> 5 ssd 0.72769 osd.5 up 1.00000 1.00000 >> 7 ssd 0.72769 osd.7 up 1.00000 1.00000 >> [root@gw01 ~]# ceph osd df >> ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >> 0 ssd 0.72769 1.00000 745 GiB 4.9 GiB 740 GiB 0.66 1.29 115 >> 2 ssd 0.72769 1.00000 745 GiB 3.1 GiB 742 GiB 0.42 0.82 83 >> 4 ssd 0.72769 1.00000 745 GiB 3.6 GiB 742 GiB 0.49 0.96 90 >> 6 ssd 0.72769 1.00000 745 GiB 3.5 GiB 742 GiB 0.47 0.93 92 >> 1 ssd 0.72769 1.00000 745 GiB 3.4 GiB 742 GiB 0.46 0.90 76 >> 3 ssd 0.72769 1.00000 745 GiB 3.9 GiB 741 GiB 0.52 1.02 102 >> 5 ssd 0.72769 1.00000 745 GiB 3.9 GiB 741 GiB 0.52 1.02 98 >> 7 ssd 0.72769 1.00000 745 GiB 4.0 GiB 741 GiB 0.54 1.06 104 >> TOTAL 5.8 TiB 30 GiB 5.8 TiB 0.51 >> MIN/MAX VAR: 0.82/1.29 STDDEV: 0.07 > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com