Ah! Makes perfect sense now. Thanks!! Sent from my iPhone
> On Jan 14, 2019, at 12:30, Gregory Farnum <gfar...@redhat.com> wrote: > >> On Fri, Jan 11, 2019 at 10:07 PM Brian Topping <brian.topp...@gmail.com> >> wrote: >> Hi all, >> >> I have a simple two-node Ceph cluster that I’m comfortable with the care and >> feeding of. Both nodes are in a single rack and captured in the attached >> dump, it has two nodes, only one mon, all pools size 2. Due to physical >> limitations, the primary location can’t move past two nodes at the present >> time. As far as hardware, those two nodes are 18-core Xeon with 128GB RAM >> and connected with 10GbE. >> >> My next goal is to add an offsite replica and would like to validate the >> plan I have in mind. For it’s part, the offsite replica can be considered >> read-only except for the occasional snapshot in order to run backups to >> tape. The offsite location is connected with a reliable and secured ~350Kbps >> WAN link. > > Unfortunately this is just not going to work. All writes to a Ceph OSD are > replicated synchronously to every replica, all reads are served from the > primary OSD for any given piece of data, and unless you do some hackery on > your CRUSH map each of your 3 OSD nodes is going to be a primary for about > 1/3 of the total data. > > If you want to move your data off-site asynchronously, there are various > options for doing that in RBD (either periodic snapshots and export-diff, or > by maintaining a journal and streaming it out) and RGW (with the multi-site > stuff). But you're not going to be successful trying to stretch a Ceph > cluster over that link. > -Greg > >> >> The following presuppositions bear challenge: >> >> * There is only a single mon at the present time, which could be expanded to >> three with the offsite location. Two mons at the primary location is >> obviously a lower MTBF than one, but with a third one on the other side of >> the WAN, I could create resiliency against *either* a WAN failure or a >> single node maintenance event. >> * Because there are two mons at the primary location and one at the offsite, >> the degradation mode for a WAN loss (most likely scenario due to facility >> support) leaves the primary nodes maintaining the quorum, which is >> desirable. >> * It’s clear that a WAN failure and a mon failure at the primary location >> will halt cluster access. >> * The CRUSH maps will be managed to reflect the topology change. >> >> If that’s a good capture so far, I’m comfortable with it. What I don’t >> understand is what to expect in actual use: >> >> * Is the link speed asymmetry between the two primary nodes and the offsite >> node going to create significant risk or unexpected behaviors? >> * Will the performance of the two primary nodes be limited to the speed that >> the offsite mon can participate? Or will the primary mons correctly >> calculate they have quorum and keep moving forward under normal operation? >> * In the case of an extended WAN outage (and presuming full uptime on >> primary site mons), would return to full cluster health be simply a matter >> of time? Are there any limits on how long the WAN could be down if the other >> two maintain quorum? >> >> I hope I’m asking the right questions here. Any feedback appreciated, >> including blogs and RTFM pointers. >> >> >> Thanks for a great product!! I’m really excited for this next frontier! >> >> Brian >> >> > [root@gw01 ~]# ceph -s >> > cluster: >> > id: nnnn >> > health: HEALTH_OK >> > >> > services: >> > mon: 1 daemons, quorum gw01 >> > mgr: gw01(active) >> > mds: cephfs-1/1/1 up {0=gw01=up:active} >> > osd: 8 osds: 8 up, 8 in >> > >> > data: >> > pools: 3 pools, 380 pgs >> > objects: 172.9 k objects, 11 GiB >> > usage: 30 GiB used, 5.8 TiB / 5.8 TiB avail >> > pgs: 380 active+clean >> > >> > io: >> > client: 612 KiB/s wr, 0 op/s rd, 50 op/s wr >> > >> > [root@gw01 ~]# ceph df >> > GLOBAL: >> > SIZE AVAIL RAW USED %RAW USED >> > 5.8 TiB 5.8 TiB 30 GiB 0.51 >> > POOLS: >> > NAME ID USED %USED MAX AVAIL OBJECTS >> > cephfs_metadata 2 264 MiB 0 2.7 TiB 1085 >> > cephfs_data 3 8.3 GiB 0.29 2.7 TiB 171283 >> > rbd 4 2.0 GiB 0.07 2.7 TiB 542 >> > [root@gw01 ~]# ceph osd tree >> > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >> > -1 5.82153 root default >> > -3 2.91077 host gw01 >> > 0 ssd 0.72769 osd.0 up 1.00000 1.00000 >> > 2 ssd 0.72769 osd.2 up 1.00000 1.00000 >> > 4 ssd 0.72769 osd.4 up 1.00000 1.00000 >> > 6 ssd 0.72769 osd.6 up 1.00000 1.00000 >> > -5 2.91077 host gw02 >> > 1 ssd 0.72769 osd.1 up 1.00000 1.00000 >> > 3 ssd 0.72769 osd.3 up 1.00000 1.00000 >> > 5 ssd 0.72769 osd.5 up 1.00000 1.00000 >> > 7 ssd 0.72769 osd.7 up 1.00000 1.00000 >> > [root@gw01 ~]# ceph osd df >> > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >> > 0 ssd 0.72769 1.00000 745 GiB 4.9 GiB 740 GiB 0.66 1.29 115 >> > 2 ssd 0.72769 1.00000 745 GiB 3.1 GiB 742 GiB 0.42 0.82 83 >> > 4 ssd 0.72769 1.00000 745 GiB 3.6 GiB 742 GiB 0.49 0.96 90 >> > 6 ssd 0.72769 1.00000 745 GiB 3.5 GiB 742 GiB 0.47 0.93 92 >> > 1 ssd 0.72769 1.00000 745 GiB 3.4 GiB 742 GiB 0.46 0.90 76 >> > 3 ssd 0.72769 1.00000 745 GiB 3.9 GiB 741 GiB 0.52 1.02 102 >> > 5 ssd 0.72769 1.00000 745 GiB 3.9 GiB 741 GiB 0.52 1.02 98 >> > 7 ssd 0.72769 1.00000 745 GiB 4.0 GiB 741 GiB 0.54 1.06 104 >> > TOTAL 5.8 TiB 30 GiB 5.8 TiB 0.51 >> > MIN/MAX VAR: 0.82/1.29 STDDEV: 0.07 >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com