On 03.04.2025 10:30, Friedrich Weber wrote:
On 28/03/2025 18:12, Gabriel Goller wrote:
This series allows the user to add fabrics such as OpenFabric and OSPF over
their clusters.
Overview
========
This series allows the user to create routed networks ('fabrics') across their
clusters, which can be used as the underlay network for a EVPN cluster, or for
creating Ceph full mesh clusters easily.
This patch series adds the initial support for two routing protocols:
* OpenFabric
* OSPF
I tested a bit with packages provided Gabriel built for me (thanks!),
both OSPF and OpenFabric, and also set up a Ceph full mesh over OpenFabric.
Overall it looked quite smooth! I didn't notice huge issues, but have
some minor points below:
- I think the error message when frr+frr-pythontools is not installed
looked a bit scary. It's on me for not reading the docs, but still,
might be nice to have a friendlier error message in that case :)
Umm which message exactly do you mean? If I uninstall frr and
frr-pythontools, I get:
WARN: missing /usr/lib/frr/frr-reload.py. Please install frr-pythontools
package
which IMO is quite ok.
- having already added one node, and then adding another using the "Add
Node" dialog, it has happened multiple times that I kept "Node" at the
default first node (which I already had defined) while I thought I was
configuring the second one, and only noticed when I submitted and got
"node already exists". And then, when I change the "Node" to the correct
one, I lost my form input :) I understand that we need to reload when
changing "Node" (the other node might have other interfaces), but to
avoid the above, maybe the dialog could preselect a node that is not yet
defined?
Yep, this is already on our todo-list. Should be as simple as passing
an array of already configured nodes down to the NodeEdit component and
then disallow them in the pveNodeSelector using 'disallowNodes'.
- when removing a fabric, the IP addresses defined on the interfaces
remain until the next reboot. I guess the reason is that ifupdown2
doesn't remove IP addresses when the corresponding stanza vanishes. Not
sure if this can be easily fixed -- if not, maybe this would be worth a
note in the docs?
Umm, I think `ifreload -a` should remove all the addresses? At least it
works on my machine :)
But I'll check again.
- when removing the only fabric and applying, the srvreload task has a
couple of spurious error messages:
2025-04-03 09:35:59,354 [91m ERROR[0m: Filename /etc/frr/frr.conf is an
empty file
frr reload command fail: command '/usr/lib/frr/frr-reload.py --stdout --reload
/etc/frr/frr.conf' failed: exit code 1
Restarting frr. at /usr/share/perl5/PVE/Network/SDN/Frr.pm line 74.
TASK OK
Hmm I guess we could check if the file is empty before reloading? That
should probably work.
- regarding the hello/csnp intervals: it would be nice to mention what the
default values are. Also, probably not relevant for this patch series, but
wanted to mention anyway: For running a Ceph full mesh over a fabric,
one probably wants to set relatively low values here (as our wiki guide
does [3])? If there is a guide in the future for setting up Ceph full mesh
over fabric, would be nice if the guide would mention that.
Yep, fixed this. Added the default values in the docs for v2.
- I'm not so sure about this, but maybe it would be nice to show the
default-hidden hello/csnp interval columns if I have entered a value
there?
This should be possible.
- when I remove hello interval+multiplier and the csnp via the GUI, I get
the following warning in the journal:
Apr 03 10:20:50 fabric159 pveproxy[9244]: Use of uninitialized value $id in
concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm
line 330.
Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value $id in
concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm
line 330.
Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value $id in
concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm
line 330.
I don't think this is related to the hello-interval and multiplier
values. AFAICT this is because of the permissions, which are completely
overhauled in v2.
- after setting up an OSPF fabric in a 3-node full mesh, I couldn't ping
the loopback addresses until I rebooted all nodes. I've attached the
task logs of the srvreloads and the ospf.cfg below [1]. After a reboot,
the pings work fine. Could it be because an OSPF with the same area
existed previously?
How long did you wait, sometimes they take a while to converge, usually
ospf more than openfabric. Could also be that some routes are cached/not
removed properly. Could you also paste the frr.conf if you still have
the cluster (`cat /etc/frr/frr.conf`)? Also can you reproduce this? Does
a `systemctl restart frr` fix it as well?
- probably a user error, but: after setting up an OpenFabric fabric and
rebooting, the routes didn't come up automatically. My openfabric.cfg is
in [2]. systemctl status frr shows the following:
Apr 03 10:02:20 fabric159 systemd[1]: Started frr.service - FRRouting.
Apr 03 10:02:21 fabric159 fabricd[699]: [NBV6R-CM3PT] OpenFabric: Needed to
resync LSPDB using CSNP!
Apr 03 10:03:48 fabric159 fabricd[699]: [QBAZ6-3YZR3] OpenFabric: Could not
find two T0 routers
Apr 03 10:02:23 fabric160 systemd[1]: Started frr.service - FRRouting.
Apr 03 10:02:24 fabric160 fabricd[674]: [MZS0T-YRAMC] OpenFabric: Initial
synchronization on ens19 complete.
Apr 03 10:03:48 fabric160 fabricd[674]: [QBAZ6-3YZR3] OpenFabric: Could not
find two T0 routers
Apr 03 10:02:19 fabric161 systemd[1]: Started frr.service - FRRouting.
Apr 03 10:02:21 fabric161 fabricd[681]: [MZS0T-YRAMC] OpenFabric: Initial
synchronization on ens20 complete.
Apr 03 10:03:48 fabric161 fabricd[681]: [QBAZ6-3YZR3] OpenFabric: Could not
find two T0 routers
Maybe I'm just too impatient, but estarting frr and waiting for ~30 seconds
fixes it.
Yeah, as I said sometimes converging takes a while, especially when
older routes are around. The logs are just warnings that this isn't a
proper "spine-leaf" topo and the isis tier couldn't be determined—this
shouldn't change anything though.
Will look into it though.
Thanks for reviewing!
[1]
fabric159:
2025-04-03 09:30:06,673 INFO: Called via "Namespace(input=None, reload=True,
test=False, debug=False, log_level='info', stdout=True, pathspace=None,
filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr',
rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)"
2025-04-03 09:30:06,673 INFO: Loading Config object from file /etc/frr/frr.conf
2025-04-03 09:30:06,690 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:06,697 INFO: "frr defaults traditional" cannot be removed
2025-04-03 09:30:06,703 INFO: Executed "ip forwarding"
2025-04-03 09:30:06,709 INFO: Executed "ipv6 forwarding"
2025-04-03 09:30:06,709 INFO: /var/run/frr/reload-B14N3D.txt content
['frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.159\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.159\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1667|mgmtd] sending configuration
[1668|zebra] sending configuration
[1671|ospfd] sending configuration
[1674|bgpd] sending configuration
[1668|zebra] done
[1682|watchfrr] sending configuration
[1684|staticd] sending configuration
[1685|bfdd] sending configuration
Waiting for children to finish applying config...
[1682|watchfrr] done
[1674|bgpd] done
[1684|staticd] done
[1685|bfdd] done
[1667|mgmtd] done
[1671|ospfd] done
2025-04-03 09:30:06,721 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:06,729 INFO: /var/run/frr/reload-UJJQIC.txt content
['line vty\nexit\n',
'frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.159\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.159\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1692|mgmtd] sending configuration
[1693|zebra] sending configuration
[1696|ospfd] sending configuration
[1699|bgpd] sending configuration
[1693|zebra] done
[1707|watchfrr] sending configuration
[1709|staticd] sending configuration
[1710|bfdd] sending configuration
Waiting for children to finish applying config...
[1707|watchfrr] done
[1696|ospfd] done
MGMTD: No changes found to be committed!
[1692|mgmtd] done
[1709|staticd] done
[1699|bgpd] done
[1710|bfdd] done
TASK OK
fabric160:
2025-04-03 09:30:09,972 INFO: Called via "Namespace(input=None, reload=True,
test=False, debug=False, log_level='info', stdout=True, pathspace=None,
filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr',
rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)"
2025-04-03 09:30:09,972 INFO: Loading Config object from file /etc/frr/frr.conf
2025-04-03 09:30:09,985 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:09,992 INFO: "frr defaults traditional" cannot be removed
2025-04-03 09:30:09,998 INFO: Executed "ip forwarding"
2025-04-03 09:30:10,004 INFO: Executed "ipv6 forwarding"
2025-04-03 09:30:10,004 INFO: /var/run/frr/reload-5ATLT2.txt content
['frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.160\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.160\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1699|mgmtd] sending configuration
[1700|zebra] sending configuration
[1703|ospfd] sending configuration
[1706|bgpd] sending configuration
[1700|zebra] done
[1714|watchfrr] sending configuration
[1716|staticd] sending configuration
[1717|bfdd] sending configuration
Waiting for children to finish applying config...
[1714|watchfrr] done
[1716|staticd] done
[1706|bgpd] done
[1717|bfdd] done
[1699|mgmtd] done
[1703|ospfd] done
2025-04-03 09:30:10,016 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:10,023 INFO: /var/run/frr/reload-NFS4UM.txt content
['line vty\nexit\n',
'frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.160\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.160\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1724|mgmtd] sending configuration
[1725|zebra] sending configuration
[1728|ospfd] sending configuration
[1731|bgpd] sending configuration
[1739|watchfrr] sending configuration
[1725|zebra] done
[1741|staticd] sending configuration
[1742|bfdd] sending configuration
Waiting for children to finish applying config...
[1739|watchfrr] done
[1741|staticd] done
[1728|ospfd] done
[1731|bgpd] done
[1742|bfdd] done
MGMTD: No changes found to be committed!
[1724|mgmtd] done
TASK OK
fabric161:
2025-04-03 09:30:08,321 INFO: Called via "Namespace(input=None, reload=True,
test=False, debug=False, log_level='info', stdout=True, pathspace=None,
filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr',
rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)"
2025-04-03 09:30:08,321 INFO: Loading Config object from file /etc/frr/frr.conf
2025-04-03 09:30:08,334 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:08,342 INFO: "frr defaults traditional" cannot be removed
2025-04-03 09:30:08,348 INFO: Executed "ip forwarding"
2025-04-03 09:30:08,354 INFO: Executed "ipv6 forwarding"
2025-04-03 09:30:08,354 INFO: /var/run/frr/reload-PVFBCH.txt content
['frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.161\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.161\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1671|mgmtd] sending configuration
[1672|zebra] sending configuration
[1675|ospfd] sending configuration
[1678|bgpd] sending configuration
[1686|watchfrr] sending configuration
[1688|staticd] sending configuration
[1672|zebra] done
[1689|bfdd] sending configuration
Waiting for children to finish applying config...
[1688|staticd] done
[1686|watchfrr] done
[1689|bfdd] done
[1678|bgpd] done
[1671|mgmtd] done
[1675|ospfd] done
2025-04-03 09:30:08,367 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:08,374 INFO: /var/run/frr/reload-SKOSWJ.txt content
['line vty\nexit\n',
'frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.161\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.161\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1696|mgmtd] sending configuration
[1697|zebra] sending configuration
[1700|ospfd] sending configuration
[1703|bgpd] sending configuration
[1697|zebra] done
[1711|watchfrr] sending configuration
[1713|staticd] sending configuration
Waiting for children to finish applying config...
[1714|bfdd] sending configuration
[1711|watchfrr] done
[1713|staticd] done
[1714|bfdd] done
[1700|ospfd] done
[1703|bgpd] done
MGMTD: No changes found to be committed!
[1696|mgmtd] done
TASK OK
# cat /etc/pve/sdn/fabrics/ospf.cfg
fabric: 1234
loopback_prefix 172.16.0.0/24
node: 1234_fabric159
interface name=ens19,ip=172.31.0.159/24
interface name=ens20,ip=172.31.2.159/24
router_id 172.16.0.159
node: 1234_fabric160
interface name=ens19,ip=172.31.0.160/24
interface name=ens20,ip=172.31.1.160/24
router_id 172.16.0.160
node: 1234_fabric161
interface name=ens19,ip=172.31.1.161/24
interface name=ens20,ip=172.31.2.161/24
router_id 172.16.0.161
[2]
# cat /etc/pve/sdn/fabrics/openfabric.cfg
fabric: fabric
hello_interval 2
loopback_prefix 172.16.0.0/24
node: fabric_fabric159
interface name=ens19,ip=172.31.0.159/24
interface name=ens20,ip=172.31.2.159/24
router_id 172.16.0.159
node: fabric_fabric160
interface name=ens19,ip=172.31.0.160/24
interface name=ens20,ip=172.31.1.160/24
router_id 172.16.0.160
node: fabric_fabric161
interface name=ens19,ip=172.31.1.161/24
interface name=ens20,ip=172.31.2.161/24
router_id 172.16.0.161
[3]
https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_(with_Fallback)
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel