On Wed, Aug 06, 2025 at 12:02:02PM +0000, nicusor.huhu...@siemens.com wrote: > > > >>-----Original Message----- > >>From: Imre Deak <imre.d...@intel.com> > >>Sent: Tuesday, August 5, 2025 6:20 PM > >>To: Dmitry Baryshkov <dmitry.barysh...@oss.qualcomm.com> > >>Cc: Huhulea, Nicusor Liviu (FT FDS CES LX PBU 1) > >><nicusor.huhu...@siemens.com>; sta...@vger.kernel.org; dri- > >>de...@lists.freedesktop.org; intel-...@lists.freedesktop.org; > >>cip-...@lists.cip- > >>project.org; shradhagu...@linux.microsoft.com; jouni.hogan...@intel.com; > >>neil.armstr...@linaro.org; jani.nik...@linux.intel.com; > >>maarten.lankho...@linux.intel.com; mrip...@kernel.org; tzimmerm...@suse.de; > >>airl...@gmail.com; dan...@ffwll.ch; joonas.lahti...@linux.intel.com; > >>rodrigo.v...@intel.com; laurentiu.pa...@oss.nxp.com; Hombourger, Cedric (FT > >>FDS CES LX) <cedric.hombour...@siemens.com>; Bobade, Shrikant Krishnarao > >>(FT FDS CES LX PBU 1) <shrikant.bob...@siemens.com> > >>Subject: Re: [PATCH] drm/probe-helper: fix output polling not resuming > >>after HPD > >>IRQ storm > >> > >>On Tue, Aug 05, 2025 at 01:46:51PM +0300, Dmitry Baryshkov wrote: > >>> On Mon, Aug 04, 2025 at 11:13:59PM +0300, Nicusor Huhulea wrote: > >>> > A regression in output polling was introduced by commit > >>> > 4ad8d57d902fbc7c82507cfc1b031f3a07c3de6e > >>> > ("drm: Check output polling initialized before disabling") in the 6.1.y > >>> > stable > >>tree. > >>> > As a result, when the i915 driver detects an HPD IRQ storm and > >>> > attempts to switch from IRQ-based hotplug detection to polling, output > >>> > polling > >>fails to resume. > >>> > > >>> > The root cause is the use of dev->mode_config.poll_running. Once > >>> > poll_running is set (during the first connector detection) the calls > >>> > to drm_kms_helper_poll_enable(), such as > >>> > intel_hpd_irq_storm_switch_to_polling() fails to schedule > >>> > output_poll_work as > >>expected. > >>> > Therefore, after an IRQ storm disables HPD IRQs, polling does not start, > >>breaking hotplug detection. > >>> > >>> Why doesn't disable path use drm_kms_helper_poll_disable() ? > >> > >>In general i915 doesn't disable polling as a whole after an IRQ storm and a > >>2 > >>minute delay (or runtime resuming), since on some other connectors the > >>polling > >>may be still required. > >> > >>Also, in the 6.1.y stable tree drm_kms_helper_poll_disable() is: > >> > >> if (drm_WARN_ON(dev, !dev->mode_config.poll_enabled)) > >> return; > >> > >> cancel_delayed_work_sync(&dev->mode_config.output_poll_work); > >> > >>so calling that wouldn't really fix the problem, which is clearly just an > >>incorrect > >>backport of the upstream commit 5abffb66d12bcac8 ("drm: > >>Check output polling initialized before disabling") to the v6.1.y stable > >>tree by > >>commit 4ad8d57d902f ("drm: Check output polling initialized before > >>disabling") in > >>v6.1.y. > >> > >>The upstream commit did not add the check for > >>dev->mode_config.poll_running in drm_kms_helper_poll_enable(), the > >>condition was only part of the diff context in the commit. Hence adding the > >>condition in the 4ad8d57d902f backport commit was incorrect. > >> > >>> > The fix is to remove the dev->mode_config.poll_running in the check > >>> > condition, ensuring polling is always scheduled as requested. > I'm not a full-time kernel developer, but I spent some time trying to really > understand both the rationale and the effects of commit "Check output polling > initialized before disabling." > Here's how I see the issue: > That commit was introduced mainly as a defensive measure, to protect > drivers such as hyperv-drm that do not initialize connector polling. > In those drivers, calling drm_kms_helper_poll_disable() without proper > polling setup could trigger warnings or even stack traces, since there > are no outputs to poll and the polling helpers don't apply. From what > I understand, the poll_running variable was added to prevent cases > where polling was accidentally disabled for drivers that were never > set up for it. However, this fix ended up creating a new class of > breakage, which I have observed and describe here.
It was added to implement the very simple logic: If something isn't initialized, enabling or disabling it is an error. If something isn't enabled, disabling it is an error. If something isn't disabled, enabling it is an error. > > To me, the core logic should be simple: if polling is needed, then we should > allow it to proceed (regardless of whether it's been scheduled previously). > > Looking at the drm_kms_helper_poll_disable() > if (drm_WARN_ON(dev, !dev->mode_config.poll_enabled)) > return; > > If the driver never enabled polling (that is, drm_kms_helper_poll_enable() > was never called), then the disable operation is effectively a no-op-totally > safe for drivers like hyperv-drm. > > And in the enable function drm_kms_helper_poll_enable(..): > if (drm_WARN_ON_ONCE(dev, !dev->mode_config.poll_enabled) || > - !drm_kms_helper_poll || dev->mode_config.poll_running) > + !drm_kms_helper_poll) > return; Why? > The main thing being guarded here is whether polling has actually been > initialized: > 1.For polling drivers like i915, removing poll_running from the enable path > is both safe and necessary: it fixes the regression with HPD polling after > IRQ storms I believe in paired calls. If you want to use drm_kms_helper_poll_enable(), it should be previously drm_kms_helper_poll_disable()d. If you have manually disabled the IRQ, it should also be manually enabled. Pairing the calls makes life much much easier. > 2.For non-polling drivers like hyperv-drm, the existing checks on > poll_enabled in both enable and disable functions are sufficient. Removing > poll_running doesn't affect these drivers-they don't go through the polling > code paths, so no polling gets scheduled or canceled by mistake > > Therefore based on my understanding and testing removing poll_running guard > not only fixes a real bug but also does not introduce new regressions for > drivers that don't use output polling. -- With best wishes Dmitry