at 06:19, <mario.limoncie...@dell.com> <mario.limoncie...@dell.com> wrote:
-----Original Message-----
From: Keith Busch <kbu...@kernel.org>
Sent: Thursday, May 9, 2019 4:54 PM
To: Limonciello, Mario
Cc: kai.heng.f...@canonical.com; h...@lst.de; ax...@fb.com;
s...@grimberg.me; raf...@kernel.org; linux...@vger.kernel.org;
rafael.j.wyso...@intel.com; linux-kernel@vger.kernel.org; linux-
n...@lists.infradead.org; keith.bu...@intel.com
Subject: Re: [PATCH] nvme-pci: Use non-operational power state instead
of D3 on
Suspend-to-Idle
[EXTERNAL EMAIL]
On Thu, May 09, 2019 at 09:37:58PM +0000, mario.limoncie...@dell.com
wrote:
+int nvme_set_power(struct nvme_ctrl *ctrl, unsigned npss)
+{
+ int ret;
+
+ mutex_lock(&ctrl->scan_lock);
+ nvme_start_freeze(ctrl);
+ nvme_wait_freeze(ctrl);
+ ret = nvme_set_features(ctrl, NVME_FEAT_POWER_MGMT, npss, NULL, 0,
+ NULL);
+ nvme_unfreeze(ctrl);
+ mutex_unlock(&ctrl->scan_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(nvme_set_power);
I believe without memory barriers at the end disks with HMB this will
still kernel panic (Such as Toshiba BG3).
Well, the mutex has an implied memory barrier, but your HMB explanation
doesn't make much sense to me anyway. The "mb()" in this thread's original
patch is a CPU memory barrier, and the CPU had better not be accessing
HMB memory. Is there something else going on here?
Kai Heng will need to speak up a bit in his time zone as he has this disk
on hand,
but what I recall from our discussion was that DMA operation MemRd after
resume was the source of the hang.
Yes, that’ what I was told by the NVMe vendor, so all I know is to impose a
memory barrier.
If mb() shouldn’t be used here, what’s the correct variant to use in this
context?
This still allows D3 which we found at least failed to go into deepest
state and
blocked
platform s0ix for the following SSDs (maybe others):
Hynix PC601
LiteOn CL1
We usually write features to spec first, then quirk non-compliant
devices after.
NVME spec doesn't talk about a relationship between SetFeatures w/
NVME_FEAT_POWER_MGMGT and D3 support, nor order of events.
This is why we opened a dialog with storage vendors, including
contrasting the behavior
of Microsoft Windows inbox NVME driver and Intel's Windows RST driver.
Those two I mention that come to mind immediately because they were most
recently
tested to fail. Our discussion with storage vendors overwhelmingly
requested
that we don't use D3 under S2I because their current firmware
architecture won't
support it.
For example one vendor told us with current implementation that receiving
D3hot
after NVME shutdown will prevent being able to enter L1.2. D3hot entry
was supported
by an IRQ handler that isn't serviced in NVME shutdown state.
Another vendor told us that with current implementation it's impossible
to transition
to PS4 (at least via APST) while L1.2 D3hot is active.
I tested the patch from Keith and it has two issues just as simply skipping
nvme_dev_disable():
1) It consumes more power in S2I
2) System freeze after resume
Also I don’t think it’s a spec. It’s a guide to let OEM/ODM pick and
assemble Modern Standby compliant machines, so what Windows NVMe driver
really does is still opaque to us.
Kai-Heng