Hi Anthony and Phil,

since my meltdown case was mentioned and I might have a network capacity issue, 
here a question about why having separate VLANS for private and public network 
might have its merits:

In our part of the ceph cluster that was overloaded (our cluster has 2 sites 
logically separate and physically different), I see a lot of dropped packets on 
the spine switch and it looks like its the downlinks to the leafs where storage 
servers. I'm still not finished investigating, so a network overload is still a 
hypothetical part of our meltdown. The question below should, however, be 
interesting in any case as it might help prevent a meltdown in case of similar 
set ups.

Our network connectivity is as follows: we have 1 storage server and up to 18 
clients per leaf. The storage servers have 6x10G connectivity in an LACP bond 
and front- and back-network share all ports but are separated by VLAN. The 
clients have 1x10G on the public network. Unfortunately, currently the up-links 
from leaf to spine switches are limited to 2x10G. We are in the progress of 
upgrading to 2x40, so let's ignore fixing this temporary bottleneck here 
(Corona got in the way) and focus on workarounds until we can access the site 
again.

For every write, currently every storage server is hit (10 servers with 8+2EC). 
Since we believed the low uplink bandwidth to be short time only during a 
network upgrade, we were willing to accept the low bandwidth assuming that the 
competition between client and storage traffic would throttle the clients 
sufficiently much to result in a working system maybe with reduced performance 
but not becoming unstable.

The questions relevant to this thread:

I kept the separation into public and cluster network, because this enables QOS 
definitions, which are typical per VLAN. In my situation, what if the up-links 
were saturated by the competing client- and storage server traffic? Both run on 
the same VLAN, obviously. The only way to make space for the OSD/heartbeat 
traffic would be to give the cluster network VLAN higher priority over public 
network by QOS settings. This should at least allow the OSDs to continue 
checking heartbeats etc. over a busy line.

Is this correct?

This also raises a question I had a long time ago and was also raised by 
Anthony. Why are the MONs not on the cluster network? If I can make a priority 
line for the OSDs, why can't I make OSD-MON communication a priority too?

While digging through heartbeat options as a consequence of our meltdown, I 
found this one:

# ceph daemon osd.0 config show | grep heart
...
    "osd_heartbeat_addr": "-",
...

# ceph daemon mon.ceph-01 config show | grep heart
...
    "osd_heartbeat_addr": "-",
...

Is it actually possible to reserve a dedicated (third) VLAN with high QOS to 
heartbeat traffic by providing a per-host IP address to this parameter? What 
does this parameter do?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Anthony D'Atri <anthony.da...@gmail.com>
Sent: 09 May 2020 23:59:49
To: Phil Regnauld
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Cluster network and public network

>> If your public network is saturated, that actually is a problem, last thing 
>> you want is to add recovery traffic, or to slow down heartbeats.  For most 
>> people, it isn’t saturated.
>
>        See Frank Schilder's post about a meltdown which he believes could have
>        been caused by beacon/hearbeat being drowned out by other recovery/IO
>        trafic, not at the network level, but at the processing level on the 
> OSDs.
>
>        If indeed there are cases where the OSDs are too busy to send (or 
> process)
>        heartbeat/beacon messaging, it wouldn't help to have a separate 
> network ?

Agreed.  Many times I’ve had to argue that CPUs that aren’t nearly saturated 
*aren’t* necessarily overkill, especially with fast media where latency hurts.  
It would be interesting to consider an architecture where a core/HT is 
dedicated to the control plane.

That said, I’ve seen a situation where excessive CPU appeared to affect latency 
by allowing the CPUs to drop C-states, this especially affected network traffic 
(2x dual 10GE).
Curiously some systems in the same cluster experienced this but some didn’t.   
There was a mix of Sandy Bridge and Ivy Bridge IIRC, as well as different 
Broadcom chips.  Despite an apparently alignment with older vs newer Broadcom 
chip, I never fully characterized the situation — replacing one of the Broadcom 
NICs in an affected system with the model in use on unaffected systems diddn’t 
resolve the issue.  It’s possible that replacing the other wwould have made a 
difference.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to