On 03/04/2025 12:21, Gabriel Goller wrote: > On 03.04.2025 10:30, Friedrich Weber wrote: >> On 28/03/2025 18:12, Gabriel Goller wrote: >>> This series allows the user to add fabrics such as OpenFabric and >>> OSPF over >>> their clusters. >>> >>> Overview >>> ======== >>> >>> This series allows the user to create routed networks ('fabrics') >>> across their >>> clusters, which can be used as the underlay network for a EVPN >>> cluster, or for >>> creating Ceph full mesh clusters easily. >>> >>> This patch series adds the initial support for two routing protocols: >>> * OpenFabric >>> * OSPF >> >> I tested a bit with packages provided Gabriel built for me (thanks!), >> both OSPF and OpenFabric, and also set up a Ceph full mesh over >> OpenFabric. >> Overall it looked quite smooth! I didn't notice huge issues, but have >> some minor points below: >> >> - I think the error message when frr+frr-pythontools is not installed >> looked a bit scary. It's on me for not reading the docs, but still, >> might be nice to have a friendlier error message in that case :) > > Umm which message exactly do you mean? If I uninstall frr and > frr-pythontools, I get: > > WARN: missing /usr/lib/frr/frr-reload.py. Please install frr- > pythontools package
On a fresh installation without frr + frr-pythontools, I get the following on srvreload: > TASK ERROR: can't open '/etc/frr/daemons' - No such file or directory Same if I `apt purge frr frr-pythontools` -- I guess because this one actually removes /etc/frr. Admittedly that's not very scary after all and somewhat self-explanatory, but still not as nice as the error message you quote. >> - having already added one node, and then adding another using the "Add >> Node" dialog, it has happened multiple times that I kept "Node" at the >> default first node (which I already had defined) while I thought I was >> configuring the second one, and only noticed when I submitted and got >> "node already exists". And then, when I change the "Node" to the correct >> one, I lost my form input :) I understand that we need to reload when >> changing "Node" (the other node might have other interfaces), but to >> avoid the above, maybe the dialog could preselect a node that is not yet >> defined? > > Yep, this is already on our todo-list. Should be as simple as passing > an array of already configured nodes down to the NodeEdit component and > then disallow them in the pveNodeSelector using 'disallowNodes'. OK, thanks :) >> - when removing a fabric, the IP addresses defined on the interfaces >> remain until the next reboot. I guess the reason is that ifupdown2 >> doesn't remove IP addresses when the corresponding stanza vanishes. Not >> sure if this can be easily fixed -- if not, maybe this would be worth a >> note in the docs? > > Umm, I think `ifreload -a` should remove all the addresses? At least it > works on my machine :) > > But I'll check again. I took a closer look -- seems I can only reproduce this if /etc/network/interfaces contains an empty `iface INTERFACE inet manual` stanza for the interface. Without such a stanza, the IP address is removed correctly. >> - regarding the hello/csnp intervals: it would be nice to mention what >> the >> default values are. Also, probably not relevant for this patch series, >> but >> wanted to mention anyway: For running a Ceph full mesh over a fabric, >> one probably wants to set relatively low values here (as our wiki guide >> does [3])? If there is a guide in the future for setting up Ceph full >> mesh >> over fabric, would be nice if the guide would mention that. > > Yep, fixed this. Added the default values in the docs for v2. Thanks! >> - when I remove hello interval+multiplier and the csnp via the GUI, I get >> the following warning in the journal: >> >>> Apr 03 10:20:50 fabric159 pveproxy[9244]: Use of uninitialized value >>> $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/ >>> Network/SDN/Fabrics.pm line 330. >>> Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value >>> $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/ >>> Network/SDN/Fabrics.pm line 330. >>> Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value >>> $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/ >>> Network/SDN/Fabrics.pm line 330. > > I don't think this is related to the hello-interval and multiplier > values. AFAICT this is because of the permissions, which are completely > overhauled in v2. OK, I see -- I can try to test this again in v2. >> - after setting up an OSPF fabric in a 3-node full mesh, I couldn't ping >> the loopback addresses until I rebooted all nodes. I've attached the >> task logs of the srvreloads and the ospf.cfg below [1]. After a reboot, >> the pings work fine. Could it be because an OSPF with the same area >> existed previously? > > How long did you wait, sometimes they take a while to converge, usually > ospf more than openfabric. Could also be that some routes are cached/not > removed properly. Could you also paste the frr.conf if you still have > the cluster (`cat /etc/frr/frr.conf`)? Also can you reproduce this? Does > a `systemctl restart frr` fix it as well? I just tried it again and it seems to be reproducible: Set up OSPF on a fresh full-mesh 3-node cluster, waited 10 minutes after the srvreload, the routes didn't come up. I've attached the frr.conf's [1]. After systemctl restart frr, the routes came up in a minute. I also have a snapshot of the cluster pre-reboot, if you want to take a look at it. >> - probably a user error, but: after setting up an OpenFabric fabric and >> rebooting, the routes didn't come up automatically. My openfabric.cfg is >> in [2]. systemctl status frr shows the following: >> >>> Apr 03 10:02:20 fabric159 systemd[1]: Started frr.service - FRRouting. >>> Apr 03 10:02:21 fabric159 fabricd[699]: [NBV6R-CM3PT] OpenFabric: >>> Needed to resync LSPDB using CSNP! >>> Apr 03 10:03:48 fabric159 fabricd[699]: [QBAZ6-3YZR3] OpenFabric: >>> Could not find two T0 routers >> >>> Apr 03 10:02:23 fabric160 systemd[1]: Started frr.service - FRRouting. >>> Apr 03 10:02:24 fabric160 fabricd[674]: [MZS0T-YRAMC] OpenFabric: >>> Initial synchronization on ens19 complete. >>> Apr 03 10:03:48 fabric160 fabricd[674]: [QBAZ6-3YZR3] OpenFabric: >>> Could not find two T0 routers >> >>> Apr 03 10:02:19 fabric161 systemd[1]: Started frr.service - FRRouting. >>> Apr 03 10:02:21 fabric161 fabricd[681]: [MZS0T-YRAMC] OpenFabric: >>> Initial synchronization on ens20 complete. >>> Apr 03 10:03:48 fabric161 fabricd[681]: [QBAZ6-3YZR3] OpenFabric: >>> Could not find two T0 routers >> >> Maybe I'm just too impatient, but estarting frr and waiting for ~30 >> seconds fixes it. > > Yeah, as I said sometimes converging takes a while, especially when > older routes are around. The logs are just warnings that this isn't a > proper "spine-leaf" topo and the isis tier couldn't be determined—this > shouldn't change anything though. > > Will look into it though. > OK -- let me know if I should test this again. One more thing I just noticed now: After installing the packages, it seems like the directory /etc/pve/sdn/fabrics isn't created and creating a new fabric in the GUI fails with > add sdn fabric failed: unable to open file '/etc/pve/sdn/fabrics/ospf.cfg.tmp.9220' - No such file or directory (500) But a manual `systemctl restart pveproxy pvedaemon` seems to create it. [1] frr.conf on fabric159: frr version 10.2.1 frr defaults datacenter hostname fabric159 log syslog informational service integrated-vtysh-config ! router ospf ospf router-id 172.16.0.159 exit ! interface dummy_12345 ip ospf area 12345 ip ospf passive exit ! interface ens19 ip ospf area 12345 exit ! interface ens20 ip ospf area 12345 exit ! access-list ospf_12345_ips permit 172.16.0.0/24 ! route-map ospf permit 100 match ip address ospf_12345_ips set src 172.16.0.159 exit ! ip protocol ospf route-map ospf ! line vty frr.conf on fabric160: frr version 10.2.1 frr defaults datacenter hostname fabric160 log syslog informational service integrated-vtysh-config ! router ospf ospf router-id 172.16.0.160 exit ! interface dummy_12345 ip ospf area 12345 ip ospf passive exit ! interface ens19 ip ospf area 12345 exit ! interface ens20 ip ospf area 12345 exit ! access-list ospf_12345_ips permit 172.16.0.0/24 ! route-map ospf permit 100 match ip address ospf_12345_ips set src 172.16.0.160 exit ! ip protocol ospf route-map ospf ! line vty frr.conf on fabric161: frr version 10.2.1 frr defaults datacenter hostname fabric161 log syslog informational service integrated-vtysh-config ! router ospf ospf router-id 172.16.0.161 exit ! interface dummy_12345 ip ospf area 12345 ip ospf passive exit ! interface ens19 ip ospf area 12345 exit ! interface ens20 ip ospf area 12345 exit ! access-list ospf_12345_ips permit 172.16.0.0/24 ! route-map ospf permit 100 match ip address ospf_12345_ips set src 172.16.0.161 exit ! ip protocol ospf route-map ospf ! line vty _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel