The branch main has been updated by gallatin:

URL: 
https://cgit.FreeBSD.org/src/commit/?id=20e15e905c58e9e2020b2c3e40caa2e8406e5827

commit 20e15e905c58e9e2020b2c3e40caa2e8406e5827
Author:     Andrew Gallatin <galla...@freebsd.org>
AuthorDate: 2025-06-29 20:51:50 +0000
Commit:     Andrew Gallatin <galla...@freebsd.org>
CommitDate: 2025-06-29 20:51:50 +0000

    mlx5: Decrease FW init timeout from 120 seconds to 5 seconds
    
    When encountering a failed NIC, the mlx5 driver will wait up to 120
    secs for the firmware to respond.  This timeout is absurdly huge, and
    leads to boot times of 40 minutes to over an hour on our servers when a
    NIC fails.  This is because the driver will attempt to attach to the
    failed NIC multiple times (once for each driver loaded after mlx5),
    and wait 2 minutes on each attempt.  This happens because the mlx5
    driver is still the best match for the device.  This delay then
    triggers watchdog timeouts in our environment, rendering servers
    with a failed NIC entirely unbootable without manual intervention.
    
    Note that FW_INIT_WARN_MESSAGE_INTERVAL must also be decreased, as
    it must be less than the init timeout.
    
    Reviewed by: kib (initial version, before reducing warn interval)
    Sponsored by: Netflix
---
 sys/dev/mlx5/device.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sys/dev/mlx5/device.h b/sys/dev/mlx5/device.h
index e6d46507a5d2..3e2c4f15a5cc 100644
--- a/sys/dev/mlx5/device.h
+++ b/sys/dev/mlx5/device.h
@@ -32,8 +32,8 @@
 
 #define        FW_INIT_TIMEOUT_MILI            2000
 #define        FW_INIT_WAIT_MS                 2
-#define        FW_PRE_INIT_TIMEOUT_MILI        120000
-#define        FW_INIT_WARN_MESSAGE_INTERVAL   20000
+#define        FW_PRE_INIT_TIMEOUT_MILI        5000
+#define        FW_INIT_WARN_MESSAGE_INTERVAL   2000
 
 #if defined(__LITTLE_ENDIAN)
 #define MLX5_SET_HOST_ENDIANNESS       0

Reply via email to