On Tue, Jun 10, 2025 at 3:44 PM Raag Jadav <raag.ja...@intel.com> wrote: > > On Thu, Jun 05, 2025 at 02:26:05PM +0200, Rafael J. Wysocki wrote: > > On Thu, Jun 5, 2025 at 1:44 PM Raag Jadav <raag.ja...@intel.com> wrote: > > > On Wed, Jun 04, 2025 at 08:19:34PM +0200, Rafael J. Wysocki wrote: > > > > On Wed, Jun 4, 2025 at 5:43 PM Raag Jadav <raag.ja...@intel.com> wrote: > > > > > On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote: > > > > > > On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.ja...@intel.com> > > > > > > wrote: > > > > > > > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote: > > > > > > > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki > > > > > > > > <raf...@kernel.org> wrote: > > > > > > > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav > > > > > > > > > <raag.ja...@intel.com> wrote: > > > > > > > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello > > > > > > > > > > wrote: > > > > > > > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote: > > > > > > > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario > > > > > > > > > > > > Limonciello wrote: > > > > > > > > > > ... > > > > > > > > > > > > > > > > From the driver perspective it does have expectations > > > > > > > > > > > that the parts outside > > > > > > > > > > > the driver did the right thing. If the driver was > > > > > > > > > > > expecting the root port > > > > > > > > > > > to be powered down at suspend and it wasn't there are > > > > > > > > > > > hardware components > > > > > > > > > > > that didn't power cycle and that's what we're seeing here. > > > > > > > > > > > > > > > > > > > > Which means the expectation set by the driver is the > > > > > > > > > > opposite of the > > > > > > > > > > purpose of this patch, and it's going to fail if any kind > > > > > > > > > > of error is > > > > > > > > > > detected under root port during suspend. > > > > > > > > > > > > > > > > > > And IMV this driver's expectation is questionable at least. > > > > > > > > > > > > > > > > > > There is no promise whatsoever that the device will always be > > > > > > > > > put into > > > > > > > > > D3cold during system suspend. > > > > > > > > > > > > > > > > For instance, user space may disable D3cold for any PCI device > > > > > > > > via the > > > > > > > > d3cold_allowed attribute in sysfs. > > > > > > > > > > > > > > > > If the driver cannot handle this, it needs to be fixed. > > > > > > > > > > > > > > Thanks for confirming. So should we consider this patch to be > > > > > > > valid > > > > > > > and worth moving forward? > > > > > > > > > > > > It doesn't do anything that would be invalid in principle IMV. > > > > > > > > > > > > You need to consider one more thing, though: It may be necessary to > > > > > > power-cycle the device in order to kick it out of the erroneous > > > > > > state > > > > > > and the patch effectively blocks this if I'm not mistaken. > > > > > > > > > > > > But admittedly I'm not sure if this really matters. > > > > > > > > > > Wouldn't something like bus reset (SBR) be more predictable? > > > > > > > > Maybe. > > > > > > > > The device state is most likely inconsistent in that case, so it > > > > depends. > > > > > > My limited understanding is that if SBR doesn't help, at that point all > > > bets are off including PMCSR configuration and probably a cold boot is > > > needed. > > > > I'm not talking about PMCSR, I'm talking about power removal (D3cold). > > This should be equivalent to a cold boot for the particular device > > except that cold boot would also reset the hierarchy above it. > > Sure. But for D3cold we rely on everything else under root port to also > be suspended, which we can't predict while in endpoint suspend path. And > with that we're back to "no guarantees" problem, which was the case either > way. The patch might just prevent any further damage than what's already > done.
Fair enough.