Am 10/11/2022 um 16:28 schrieb Fabian Grünbichler: > as reported in > https://forum.proxmox.com/threads/sudden-reboot-of-multiple-nodes-while-adding-a-new-node.116714/ > > this patch just fixes a particular issue where a node joins (as in > quorum membership change, not limited to PVE cluster join) an existing > cluster, but has a lower MTU than the existing links to the already > joined part of the cluster. > > i.e.: > > Node A: MTU 9000 > Node B: MTU 9000 > Node C: MTU 1500 > > A & B are already up and running and have established that they can talk > to eachother with MTU 9000 (-overhead). Now C joins as well - without > the reset and re-schedule of MTU discovery in this patch, A and B will > use MTU 9000 when talking to C, but those packets might never arrive > (depending on network hardware and configuration). Since the heartbeat > packets used to detect the link status are always small, they are able > to arrive at C without any problems. If the network along the way > doesn't reject the packets, but just drops them, the MTU discovery is > also severely delayed (up to tens of minutes until the actual, low MTU > is correctly detected!). > > In the regular case, the reset will be immediately followed by detecting > the correct MTU for the new link (and depending on whether its lower > than the other links, the global MTU used for fragmenting by knet), and > the window with additional overhead (smaller MTU => more fragmentation > => more packets) should be fairly small. In case of a network blackhole > negatively affecting MTU discovery, the window might be big, but without > this patch, the result is a complete outage of the whole cluster, which > is even less desirable than a cluster running with performance impacted. > > Upstream is working on further improving similar failure scenarios, such as: > - improved handling of MTU being lowered at runtime (either at the link > level, or somewhere along the network path) > - improving MTU discovery timeouts and intervals to speedup recovery > even with blackholing networks > > These other changes are still work in progress and will follow at a > later date. > > This patch is cherry-picked from upstream branch stable1-proposed > (slated for inclusion in the next stable 1.x release of libknet). > > Signed-off-by: Fabian Grünbichler <f.gruenbich...@proxmox.com> > --- > We might evaluate setting netmtu to 1500-overhead in our cluster > creation code to avoid MTU related issues - the net benefit for setting > up high MTU for corosync traffic is likely neglible, and almost always > a side-effect of re-using network links also used as uplinks or storage > links. > > netmtu is used by corosync to fragment its messages *before* passing > them to knet, avoiding the need to fragment at the knet layer. There is > also a (new, git-only at the moment) corosync.conf option for setting > the MTU used by knet, skipping the pMTU-discovered one entirely. we > could cherry-pick and set this option as well in case we want to default > to "non-jumbo MTU". > > ...eset-restart-pmtud-when-a-node-joins.patch | 156 ++++++++++++++++++ > debian/patches/series | 1 + > 2 files changed, 157 insertions(+) > create mode 100644 > debian/patches/0001-pmtud-Reset-restart-pmtud-when-a-node-joins.patch > >
applied, thanks! _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel