Re: [ceph-users] Offsite replication scenario

Brian Topping Mon, 14 Jan 2019 13:22:38 -0800

Ah! Makes perfect sense now. Thanks!! 

Sent from my iPhone


> On Jan 14, 2019, at 12:30, Gregory Farnum <gfar...@redhat.com> wrote:
> 
>> On Fri, Jan 11, 2019 at 10:07 PM Brian Topping <brian.topp...@gmail.com> 
>> wrote:
>> Hi all,
>> 
>> I have a simple two-node Ceph cluster that I’m comfortable with the care and 
>> feeding of. Both nodes are in a single rack and captured in the attached 
>> dump, it has two nodes, only one mon, all pools size 2. Due to physical 
>> limitations, the primary location can’t move past two nodes at the present 
>> time. As far as hardware, those two nodes are 18-core Xeon with 128GB RAM 
>> and connected with 10GbE. 
>> 
>> My next goal is to add an offsite replica and would like to validate the 
>> plan I have in mind. For it’s part, the offsite replica can be considered 
>> read-only except for the occasional snapshot in order to run backups to 
>> tape. The offsite location is connected with a reliable and secured ~350Kbps 
>> WAN link. 
> 
> Unfortunately this is just not going to work. All writes to a Ceph OSD are 
> replicated synchronously to every replica, all reads are served from the 
> primary OSD for any given piece of data, and unless you do some hackery on 
> your CRUSH map each of your 3 OSD nodes is going to be a primary for about 
> 1/3 of the total data.
> 
> If you want to move your data off-site asynchronously, there are various 
> options for doing that in RBD (either periodic snapshots and export-diff, or 
> by maintaining a journal and streaming it out) and RGW (with the multi-site 
> stuff). But you're not going to be successful trying to stretch a Ceph 
> cluster over that link.
> -Greg
>  
>> 
>> The following presuppositions bear challenge:
>> 
>> * There is only a single mon at the present time, which could be expanded to 
>> three with the offsite location. Two mons at the primary location is 
>> obviously a lower MTBF than one, but  with a third one on the other side of 
>> the WAN, I could create resiliency against *either* a WAN failure or a 
>> single node maintenance event. 
>> * Because there are two mons at the primary location and one at the offsite, 
>> the degradation mode for a WAN loss (most likely scenario due to facility 
>> support) leaves the primary nodes maintaining the quorum, which is 
>> desirable. 
>> * It’s clear that a WAN failure and a mon failure at the primary location 
>> will halt cluster access.
>> * The CRUSH maps will be managed to reflect the topology change.
>> 
>> If that’s a good capture so far, I’m comfortable with it. What I don’t 
>> understand is what to expect in actual use:
>> 
>> * Is the link speed asymmetry between the two primary nodes and the offsite 
>> node going to create significant risk or unexpected behaviors?
>> * Will the performance of the two primary nodes be limited to the speed that 
>> the offsite mon can participate? Or will the primary mons correctly 
>> calculate they have quorum and keep moving forward under normal operation?
>> * In the case of an extended WAN outage (and presuming full uptime on 
>> primary site mons), would return to full cluster health be simply a matter 
>> of time? Are there any limits on how long the WAN could be down if the other 
>> two maintain quorum?
>> 
>> I hope I’m asking the right questions here. Any feedback appreciated, 
>> including blogs and RTFM pointers.
>> 
>> 
>> Thanks for a great product!! I’m really excited for this next frontier!
>> 
>> Brian
>> 
>> > [root@gw01 ~]# ceph -s
>> >  cluster:
>> >    id:     nnnn
>> >    health: HEALTH_OK
>> > 
>> >  services:
>> >    mon: 1 daemons, quorum gw01
>> >    mgr: gw01(active)
>> >    mds: cephfs-1/1/1 up  {0=gw01=up:active}
>> >    osd: 8 osds: 8 up, 8 in
>> > 
>> >  data:
>> >    pools:   3 pools, 380 pgs
>> >    objects: 172.9 k objects, 11 GiB
>> >    usage:   30 GiB used, 5.8 TiB / 5.8 TiB avail
>> >    pgs:     380 active+clean
>> > 
>> >  io:
>> >    client:   612 KiB/s wr, 0 op/s rd, 50 op/s wr
>> > 
>> > [root@gw01 ~]# ceph df
>> > GLOBAL:
>> >    SIZE        AVAIL       RAW USED     %RAW USED 
>> >    5.8 TiB     5.8 TiB       30 GiB          0.51 
>> > POOLS:
>> >    NAME                ID     USED        %USED     MAX AVAIL     OBJECTS 
>> >    cephfs_metadata     2      264 MiB         0       2.7 TiB        1085 
>> >    cephfs_data         3      8.3 GiB      0.29       2.7 TiB      171283 
>> >    rbd                 4      2.0 GiB      0.07       2.7 TiB         542 
>> > [root@gw01 ~]# ceph osd tree
>> > ID CLASS WEIGHT  TYPE NAME     STATUS REWEIGHT PRI-AFF 
>> > -1       5.82153 root default                          
>> > -3       2.91077     host gw01                         
>> > 0   ssd 0.72769         osd.0     up  1.00000 1.00000 
>> > 2   ssd 0.72769         osd.2     up  1.00000 1.00000 
>> > 4   ssd 0.72769         osd.4     up  1.00000 1.00000 
>> > 6   ssd 0.72769         osd.6     up  1.00000 1.00000 
>> > -5       2.91077     host gw02                         
>> > 1   ssd 0.72769         osd.1     up  1.00000 1.00000 
>> > 3   ssd 0.72769         osd.3     up  1.00000 1.00000 
>> > 5   ssd 0.72769         osd.5     up  1.00000 1.00000 
>> > 7   ssd 0.72769         osd.7     up  1.00000 1.00000 
>> > [root@gw01 ~]# ceph osd df
>> > ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE VAR  PGS 
>> > 0   ssd 0.72769  1.00000 745 GiB 4.9 GiB 740 GiB 0.66 1.29 115 
>> > 2   ssd 0.72769  1.00000 745 GiB 3.1 GiB 742 GiB 0.42 0.82  83 
>> > 4   ssd 0.72769  1.00000 745 GiB 3.6 GiB 742 GiB 0.49 0.96  90 
>> > 6   ssd 0.72769  1.00000 745 GiB 3.5 GiB 742 GiB 0.47 0.93  92 
>> > 1   ssd 0.72769  1.00000 745 GiB 3.4 GiB 742 GiB 0.46 0.90  76 
>> > 3   ssd 0.72769  1.00000 745 GiB 3.9 GiB 741 GiB 0.52 1.02 102 
>> > 5   ssd 0.72769  1.00000 745 GiB 3.9 GiB 741 GiB 0.52 1.02  98 
>> > 7   ssd 0.72769  1.00000 745 GiB 4.0 GiB 741 GiB 0.54 1.06 104 
>> >                    TOTAL 5.8 TiB  30 GiB 5.8 TiB 0.51          
>> > MIN/MAX VAR: 0.82/1.29  STDDEV: 0.07
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Offsite replication scenario

Reply via email to