Re: [ceph-users] Offsite replication scenario

Wido den Hollander Wed, 16 Jan 2019 14:13:47 -0800


On 1/16/19 8:08 PM, Anthony Verevkin wrote:
> I would definitely see huge value in going to 3 MONs here (and btw 2 on-site 
> MGR and 2 on-site MDS)
> However 350Kbps is quite low and MONs may be latency sensitive, so I suggest 
> you do heavy QoS if you want to use that link for ANYTHING else.
> If you do so, make sure your clients are only listing the on-site MONs so 
> they don't try to read from the off-site MON.


Won't work. The clients will receive the Monmap and they might choose to
fail over to the MON on the 3rd location.

> Still you risk the stability of the cluster if the off-site MON starts 
> lagging. If it's still considered on while lagging, all changes to cluster 
> (osd going up/down, etc) would be blocked by waiting it to commit.
> 
> Even if you choose against an off-site MON, maybe consider 2 on-site MON 
> instead. Yes, you'd double the risk of cluster going to a halt if any one 
> node dies vs one specific node dying. But if that happens you have a manual 
> way of downgrading to a single MON (and you still have your MON's data) vs 
> risking to get stuck with a OSD-only node that had never had MON installed 
> and not having a copy of MON DB.
> 
> I also see how you want to get the data out for backups.
> Having a third replica off-site definitely won't fly with such bandwidth as 
> it would once again block the IO until committed by the off-site OSD.
> I am not quite sure RBD mirroring would play nicely with this kind of link 
> either. Maybe stick with application-level off-site backups.
> And again, whatever replication/backup strategy you do, need to QoS or else 
> you'd cripple your connection which I assume is used for some other 
> communications as well.
> 
> ... or totally unrelated to Ceph, maybe just backup to the local USB drive 
> and have somebody replace it and ship to the head office once a while?
> 
> Regards,
> Anthony
> 
> ----- Original Message -----
> From: "Brian Topping" <brian.topp...@gmail.com>
> To: ceph-users@lists.ceph.com
> Sent: Saturday, January 12, 2019 1:07:05 AM
> Subject: [ceph-users] Offsite replication scenario
> 
> Hi all,
> 
> I have a simple two-node Ceph cluster that I’m comfortable with the care and 
> feeding of. Both nodes are in a single rack and captured in the attached 
> dump, it has two nodes, only one mon, all pools size 2. Due to physical 
> limitations, the primary location can’t move past two nodes at the present 
> time. As far as hardware, those two nodes are 18-core Xeon with 128GB RAM and 
> connected with 10GbE. 
> 
> My next goal is to add an offsite replica and would like to validate the plan 
> I have in mind. For it’s part, the offsite replica can be considered 
> read-only except for the occasional snapshot in order to run backups to tape. 
> The offsite location is connected with a reliable and secured ~350Kbps WAN 
> link. 
> 
> The following presuppositions bear challenge:
> 
> * There is only a single mon at the present time, which could be expanded to 
> three with the offsite location. Two mons at the primary location is 
> obviously a lower MTBF than one, but  with a third one on the other side of 
> the WAN, I could create resiliency against *either* a WAN failure or a single 
> node maintenance event. 
> * Because there are two mons at the primary location and one at the offsite, 
> the degradation mode for a WAN loss (most likely scenario due to facility 
> support) leaves the primary nodes maintaining the quorum, which is desirable. 
> * It’s clear that a WAN failure and a mon failure at the primary location 
> will halt cluster access.
> * The CRUSH maps will be managed to reflect the topology change.
> 
> If that’s a good capture so far, I’m comfortable with it. What I don’t 
> understand is what to expect in actual use:
> 
> * Is the link speed asymmetry between the two primary nodes and the offsite 
> node going to create significant risk or unexpected behaviors?
> * Will the performance of the two primary nodes be limited to the speed that 
> the offsite mon can participate? Or will the primary mons correctly calculate 
> they have quorum and keep moving forward under normal operation?
> * In the case of an extended WAN outage (and presuming full uptime on primary 
> site mons), would return to full cluster health be simply a matter of time? 
> Are there any limits on how long the WAN could be down if the other two 
> maintain quorum?
> 
> I hope I’m asking the right questions here. Any feedback appreciated, 
> including blogs and RTFM pointers.
> 
> 
> Thanks for a great product!! I’m really excited for this next frontier!
> 
> Brian
> 
>> [root@gw01 ~]# ceph -s
>>  cluster:
>>    id:     nnnn
>>    health: HEALTH_OK
>>
>>  services:
>>    mon: 1 daemons, quorum gw01
>>    mgr: gw01(active)
>>    mds: cephfs-1/1/1 up  {0=gw01=up:active}
>>    osd: 8 osds: 8 up, 8 in
>>
>>  data:
>>    pools:   3 pools, 380 pgs
>>    objects: 172.9 k objects, 11 GiB
>>    usage:   30 GiB used, 5.8 TiB / 5.8 TiB avail
>>    pgs:     380 active+clean
>>
>>  io:
>>    client:   612 KiB/s wr, 0 op/s rd, 50 op/s wr
>>
>> [root@gw01 ~]# ceph df
>> GLOBAL:
>>    SIZE        AVAIL       RAW USED     %RAW USED 
>>    5.8 TiB     5.8 TiB       30 GiB          0.51 
>> POOLS:
>>    NAME                ID     USED        %USED     MAX AVAIL     OBJECTS 
>>    cephfs_metadata     2      264 MiB         0       2.7 TiB        1085 
>>    cephfs_data         3      8.3 GiB      0.29       2.7 TiB      171283 
>>    rbd                 4      2.0 GiB      0.07       2.7 TiB         542 
>> [root@gw01 ~]# ceph osd tree
>> ID CLASS WEIGHT  TYPE NAME     STATUS REWEIGHT PRI-AFF 
>> -1       5.82153 root default                          
>> -3       2.91077     host gw01                         
>> 0   ssd 0.72769         osd.0     up  1.00000 1.00000 
>> 2   ssd 0.72769         osd.2     up  1.00000 1.00000 
>> 4   ssd 0.72769         osd.4     up  1.00000 1.00000 
>> 6   ssd 0.72769         osd.6     up  1.00000 1.00000 
>> -5       2.91077     host gw02                         
>> 1   ssd 0.72769         osd.1     up  1.00000 1.00000 
>> 3   ssd 0.72769         osd.3     up  1.00000 1.00000 
>> 5   ssd 0.72769         osd.5     up  1.00000 1.00000 
>> 7   ssd 0.72769         osd.7     up  1.00000 1.00000 
>> [root@gw01 ~]# ceph osd df
>> ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE VAR  PGS 
>> 0   ssd 0.72769  1.00000 745 GiB 4.9 GiB 740 GiB 0.66 1.29 115 
>> 2   ssd 0.72769  1.00000 745 GiB 3.1 GiB 742 GiB 0.42 0.82  83 
>> 4   ssd 0.72769  1.00000 745 GiB 3.6 GiB 742 GiB 0.49 0.96  90 
>> 6   ssd 0.72769  1.00000 745 GiB 3.5 GiB 742 GiB 0.47 0.93  92 
>> 1   ssd 0.72769  1.00000 745 GiB 3.4 GiB 742 GiB 0.46 0.90  76 
>> 3   ssd 0.72769  1.00000 745 GiB 3.9 GiB 741 GiB 0.52 1.02 102 
>> 5   ssd 0.72769  1.00000 745 GiB 3.9 GiB 741 GiB 0.52 1.02  98 
>> 7   ssd 0.72769  1.00000 745 GiB 4.0 GiB 741 GiB 0.54 1.06 104 
>>                    TOTAL 5.8 TiB  30 GiB 5.8 TiB 0.51          
>> MIN/MAX VAR: 0.82/1.29  STDDEV: 0.07
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Offsite replication scenario

Reply via email to