Hi, We recently running a concurrent 10K wireguard tunnels test and there is an handshake issue I would like to get community's attention.
Its seem to us that the VPP wireguard handshake process is always proceeded by main thread, and there is handoff process if other threads receive handshake messages. This means that the main thread is running in the thread-safe environment. In the case of a large number of tunnelling setups (a lot handshakes), the main thread configures all the tunnels and the handshake process function need to look-up element in the vectors from all existing the wireguard interfaces with an exclusive UDP port (listen-port). This seems a bottle neck for us, for example: to setup 10k tunnels, as every handshake came with the same listen-port (e.g. 51820), is equal to look-up an element in vector with 10k entries. The process can be really time consuming if corresponds interface located at the end of vectors, as the MAC check and calculation is required by VPP upon received messages. The following are the code executed by main thread: static wg_input_error_t wg_handshake_process (vlib_main_t *vm, wg_main_t *wmp, vlib_buffer_t *b, u32 node_idx, u8 is_ip4) ... index_t *ii; wg_ifs = wg_if_indexes_get_by_port (udp_dst_port); if (NULL == wg_ifs) return WG_INPUT_ERROR_INTERFACE; vec_foreach (ii, wg_ifs) { wg_if = wg_if_get (*ii); if (NULL == wg_if) continue; under_load = wg_if_is_under_load (vm, wg_if); mac_state = cookie_checker_validate_macs ( vm, &wg_if->cookie_checker, macs, current_b_data, len, under_load, &src_ip, udp_src_port); if (mac_state == INVALID_MAC) { wg_if_dec_handshake_num (wg_if); wg_if = NULL; continue; } break; } The variable "wg_ifs" has value how many wireguard interfaces were created with the same "udp_dst_port", for each "vec_foreach" has to look-up on "wg_ifs" elements, and the "cookie_checker_validate_macs()" function has to check received "mac" with "mac" that will be calculated with key for each interfaces separately (&wg_if->cookie_checker). I measured the time before and after handshake process function call, and I discovered that the number of handshake processed per second are much lower when more tunnelling existed in the system. This means that vpp eventually won't able to process all the handshake messages in time, and it can lead to packets drop during the handoff - due to not have enough space for income handshake messages where the main thread still busy for the "old" handshake, then a large number of "congestion drop" start to happened during in "wg4-handshake-handoff". I have also investigated "under load state" in the wireguard, as all the handshake comes with the same UDP listen-port and the most time-consuming part is look-up, there is very little improvement. Please see the patch for update under-load state determination for wireguard: https://gerrit.fd.io/r/c/vpp/+/37764 I came out with the idea using different UDP listening port in configuration. In this case, the results we gathered are much promising, the VPP got much better handshake performance and we are no longer see "congestion drop" with larger number of tunnel can be created. I would like to ask community members to review my under-load state determination patch and any feedback on the change UDP listening port approach are welcome. I looking forward to hearing from you soon. Best regards, Gabriel Oginski -------------------------------------------------------------- Intel Research and Development Ireland Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22462): https://lists.fd.io/g/vpp-dev/message/22462 Mute This Topic: https://lists.fd.io/mt/96242405/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-