On Wed, 12 Feb 2026, Arun R Murthy <[email protected]> wrote:
> The port refclk enable timeout and the soc ready timeout value mentioned
> in the spec is the PHY timings and doesn't include the turnaround time
> from the SoC or OS. So add an overhead timeout value on top of the
> recommended timeouts from the PHY spec.

Hi Arun,

Thanks for the fix. I wanted to flag that I independently identified
this exact issue and posted a detailed root cause analysis on the i915
GitLab tracker five days before this patch series.

On February 7, 2026, I filed the analysis on GitLab issue #14713:

  https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14713#note_2739573

That comment includes the following findings, which directly correspond
to what this patch addresses:

1. Traced the error to intel_cx0_phy_lane_reset() in intel_cx0_phy.c
   (line ~2911), where the driver writes the PCLK_REFCLK_REQUEST bit to
   XELPDP_PORT_CLOCK_CTL and polls for PCLK_REFCLK_ACK with a timeout
   of XELPDP_REFCLK_ENABLE_TIMEOUT_US = 1 (1 us).

2. Identified that this calls __intel_wait_for_register() with
   fast_timeout_us=1 and slow_timeout_ms=0 -- a single spin-poll with
   no slow-path fallback.

3. Compared the 1 us refclk timeout against other timeouts in the same
   PHY init sequence:

     XELPDP_PORT_BUF_SOC_READY_TIMEOUT_US  = 100 us
     XELPDP_PORT_RESET_START_TIMEOUT_US     =   5 us
     XELPDP_PCLK_PLL_ENABLE_TIMEOUT_US      = 3200 us
     XELPDP_PORT_RESET_END_TIMEOUT_MS       =  15 ms

   The 1 us value is an outlier by 1-3 orders of magnitude compared to
   every other timeout in the same code path.

4. Recommended increasing XELPDP_REFCLK_ENABLE_TIMEOUT_US to ~100 us
   or adding a slow-path ms fallback, consistent with how other waits
   in the same function are structured.

This analysis was performed on a Lenovo ThinkPad P16 Gen 3 with an
Arrow Lake-S Core Ultra 9 275HX (device ID 7d67) running kernel
6.19.0-rc8. The PHY A refclk failure reproduced on every boot at ~8.5s
after i915 init, during the eDP panel probe path.

Your patch does the right thing -- increasing the timeout values and
adding SoC/OS overhead. Since my analysis identified the root cause and
recommended the same fix direction, I'd appreciate attribution:

Reported-by: Cole Leavitt <[email protected]>

Thanks,
Cole

Reply via email to