The branch main has been updated by gallatin: URL: https://cgit.FreeBSD.org/src/commit/?id=20e15e905c58e9e2020b2c3e40caa2e8406e5827
commit 20e15e905c58e9e2020b2c3e40caa2e8406e5827 Author: Andrew Gallatin <galla...@freebsd.org> AuthorDate: 2025-06-29 20:51:50 +0000 Commit: Andrew Gallatin <galla...@freebsd.org> CommitDate: 2025-06-29 20:51:50 +0000 mlx5: Decrease FW init timeout from 120 seconds to 5 seconds When encountering a failed NIC, the mlx5 driver will wait up to 120 secs for the firmware to respond. This timeout is absurdly huge, and leads to boot times of 40 minutes to over an hour on our servers when a NIC fails. This is because the driver will attempt to attach to the failed NIC multiple times (once for each driver loaded after mlx5), and wait 2 minutes on each attempt. This happens because the mlx5 driver is still the best match for the device. This delay then triggers watchdog timeouts in our environment, rendering servers with a failed NIC entirely unbootable without manual intervention. Note that FW_INIT_WARN_MESSAGE_INTERVAL must also be decreased, as it must be less than the init timeout. Reviewed by: kib (initial version, before reducing warn interval) Sponsored by: Netflix --- sys/dev/mlx5/device.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sys/dev/mlx5/device.h b/sys/dev/mlx5/device.h index e6d46507a5d2..3e2c4f15a5cc 100644 --- a/sys/dev/mlx5/device.h +++ b/sys/dev/mlx5/device.h @@ -32,8 +32,8 @@ #define FW_INIT_TIMEOUT_MILI 2000 #define FW_INIT_WAIT_MS 2 -#define FW_PRE_INIT_TIMEOUT_MILI 120000 -#define FW_INIT_WARN_MESSAGE_INTERVAL 20000 +#define FW_PRE_INIT_TIMEOUT_MILI 5000 +#define FW_INIT_WARN_MESSAGE_INTERVAL 2000 #if defined(__LITTLE_ENDIAN) #define MLX5_SET_HOST_ENDIANNESS 0