[ceph-users] Re: Cluster network and public network

Frank Schilder Wed, 13 May 2020 04:38:19 -0700

Dear all,

looks like I need to be more precise:

>>>   I think, however, that a disappearing back network has no real 
>>> consequences as the heartbeats always go over both.
>> 
>> FWIW this has not been my experience, at least through Luminous.
>> 
>> What I’ve seen is that when the cluster/replication net is configured but 
>> unavailable, OSD heartbeats fail
> and peers report them to the mons as down.  The mons send out a map 
> accordingly, and the affected
> OSDs report “I’m not dead yet!”.  Flap flap flap.
> 
> +1. This has also been my experience. And it's quit hard to debug as
> well (confusing / seemingly contradictory messages).
> 
> It uses the back network to replicate data ... and as long as it can't
> (client) IO wont go through.

I did not mean to have a back network configured but it is taken down. Of 
course this won't work. What I mean is that you:

1. remove the cluster network definition from the cluster config (ceph.conf 
and/or ceph config ...)
2. restart OSDs to apply the change
3. remove the physical network

Step 2 will most likely require down time as you write, because during the 
transition some OSDs will think all OSDs listen on 2 while other OSDs think 
everyone is listening on 1 network. If you can afford to take all clients down 
and do a full cluster restart, this is doable. If you set noout,nodown,pause 
and maybe some other flags (norebalance,nobackfill,norecover), wait for all 
client *and* recovery I/O to complete, it is probably possible to do this 
transition without disconnecting clients by just restarting all OSDs failure 
domain by failure domain.

After the transition things should work fine with just 1 network.

In any case, my recommendation would be to keep both networks if they are on 
different VLAN IDs. Then, nothing special is required to do the transition and 
this is what I did to simplify the physical networking (two logical networks, 
identical physical networking).

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Stefan Kooman <ste...@bit.nl>
Sent: 13 May 2020 07:40
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Cluster network and public network

On 2020-05-12 18:59, Anthony D'Atri wrote:
>
>>   I think, however, that a disappearing back network has no real 
>> consequences as the heartbeats always go over both.
>
> FWIW this has not been my experience, at least through Luminous.
>
> What I’ve seen is that when the cluster/replication net is configured but 
> unavailable, OSD heartbeats fail and peers report them to the mons as down.  
> The mons send out a map accordingly, and the affected OSDs report “I’m not 
> dead yet!”.  Flap flap flap.

+1. This has also been my experience. And it's quit hard to debug as
well (confusing / seemingly contradictory messages).

It uses the back network to replicate data ... and as long as it can't
(client) IO wont go through.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cluster network and public network

Reply via email to