Hi,

>That only makes sense if you're running multiple ToR switches per rack for the 
>public leaf network. Multiple public ToR switches per rack is not very common; 
>most Clos crossbar networks run a single ToR switch. Several >guides on the 
>topic (including Arista & Cisco) suggest that you use something like MLAG in a 
>layer 2 domain between the switches if you need some sort of switch redundancy 
>inside the rack. This increases complexity, and most people decide that it's 
>not worth it and instead scale out across racks to gain the redundancy and 
>survivability that multiple ToR offer.
If you use MLAG for L2 redundancy, you’ll still want 2 BGP sessions for L3 
redundancy, so why not skipping the MLAG all together and terminating your BGP 
session on each ToR?

Judging by the routes (169.254.0.1), you are using BGP unnumebered?

It sounds like the “ip route get” output you get when using dummy0 is caused by 
a fallback on the default route, supposedly on eth0? Can check the exact routes 
received on server1 with “show ip bgp neighbors <neighbor> received-routes” 
once you enable “neighbor <neighbor> soft-reconfiguration inbound” and what’s 
installed in the table “ip route”?


Intrigued by this problem, I tried to reproduce it in a lab with virtualbox. I 
ran into the same problem.

Side note: Configuring the loopback IP on the physical interfaces is workable 
if you set it on **all** parallel links. Example with server1:

“iface enp3s0f0 inet static
  address 10.10.100.21/32
iface enp3s0f1 inet static
  address 10.10.100.21/32
iface enp4s0f0 inet static
  address 10.10.100.21/32
iface enp4s0f1 inet static
  address 10.10.100.21/32”

This should guarantee that the loopback ip is advertised if one of the 4 links 
to switch1 and switch2 is up, but I am not sure if that’s workable for ceph’s 
listening address.


Cheers,
Maxime

From: Richard Hesse <richard.he...@weebly.com>
Date: Thursday 20 April 2017 16:36
To: Maxime Guyot <maxime.gu...@elits.com>
Cc: Jan Marquardt <j...@artfiles.de>, "ceph-users@lists.ceph.com" 
<ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Ceph with Clos IP fabric

On Thu, Apr 20, 2017 at 2:13 AM, Maxime Guyot 
<maxime.gu...@elits.com<mailto:maxime.gu...@elits.com>> wrote:
>2) Why did you choose to run the ceph nodes on loopback interfaces as opposed 
>to the /24 for the "public" interface?
I can’t speak for this example, but in a clos fabric you generally want to 
assign the routed IPs on loopback rather than physical interfaces. This way if 
one of the link goes down (t.ex the public interface), the routed IP is still 
advertised on the other link(s).

That only makes sense if you're running multiple ToR switches per rack for the 
public leaf network. Multiple public ToR switches per rack is not very common; 
most Clos crossbar networks run a single ToR switch. Several guides on the 
topic (including Arista & Cisco) suggest that you use something like MLAG in a 
layer 2 domain between the switches if you need some sort of switch redundancy 
inside the rack. This increases complexity, and most people decide that it's 
not worth it and instead  scale out across racks to gain the redundancy and 
survivability that multiple ToR offer.

On Thu, Apr 20, 2017 at 4:04 AM, Jan Marquardt 
<j...@artfiles.de<mailto:j...@artfiles.de>> wrote:

Maxime, thank you for clarifying this. Each server is configured like this:

lo/dummy0: Loopback interface; Holds the ip address used with Ceph,
which is announced by BGP into the fabric.

enp5s0: Management Interface, which is used only for managing the box.
There should not be any Ceph traffic on this one.

enp3s0f0: connected to sw01 and used for BGP
enp3s0f1: connected to sw02 and used for BGP
enp4s0f0: connected to sw01 and used for BGP
enp4s0f1: connected to sw02 and used for BGP

These four interfaces are supposed to transport the Ceph traffic.

See above. Why are you running multiple public ToR switches in this rack? I'd 
suggest switching them to a single layer 2 domain and participate in the Clos 
fabric as a single unit, or scale out across racks (preferred). Why bother with 
multiple switches in a rack when you can just use multiple racks? That's the 
beauty of Clos: just add more spines if you need more leaf to leaf bandwidth.

How many OSD, servers, and racks are planned for this deployment?

-richard

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to