On Thu, Oct 5, 2017 at 5:17 AM, Luis R. Rodriguez <mcg...@kernel.org> wrote: > On Tue, Aug 22, 2017 at 03:52:46PM +0800, Kai-Heng Feng wrote: >> Currently, firmware will only be chached if assign_firmware_buf() gets >> called. > > True, but also more importantly we peg the fw cache to the device via devres > *iff* the firmware actually was found. We do this so that we don't try to look > for bogus firmwares or firmwares which we currently do not have on our > filesystem (consider a driver that uses a series of revisions of firmwares). > >> When a device loses its power or a USB device gets plugged to another >> port under suspend, request_firmware() can still find cached firmware, >> but firmware name no longer associates with the new device's devres. >> So next time the system suspend, those firmware won't be cached. > > Gah, its a bit more complicated than that. During suspend we call out to > request firmware proactively for all firmwares in our fw cache. The > fw cache is used simply to fetch the caches for firwmares during suspend > so that on resume they exist to avoid races against the filesystem. It > however uses the same functionality as the batched firwmare feature, which > in turn is used to share one buf over different requests. > > If a device is unplugged its not clear to me why the old cache would > not work for the new one as its all shared, so the only thing that I > can think of is if the old device being disconnected is processed first, > and therefore releases the old cache, so when the new device is processed > it does not use the new cache. This should still not be an issue though, > unless of course real races happen with the filesystem.
Because the new one's devres doesn't have the fw, i.e. fw_add_devm_name() not gets called in this case. > > As of recently though we have had new findings which indicate that the > old UMH lock was causing issues on resume on BT devices which *only* > needed firmware on resume, but not on boot, so their first fw cache > was not generated. That issue can resemble this one, in that no cache > can be present, and a race happens on resume. > > The old UMH lock then was causing a failure on resume, and one of the > solutions which could have worked was a proactive "hey set this cache > up for me" even though the device didn't need the firmware. This > is no longer needed given that the UMH lock stuff is gone from > direct FS lookups and the issue should no longer be present. > > That is, Linus' revert of commit f007cad159e99fa2acd3b2e9364fbb32ad28b971 > ("Revert "firmware: add sanity check on shutdown/suspend") I believe > should fix this issue. > > I'm actually inclined to remove the fw cache stuff as I no longer see > the advantage of it given we are doing FS lookups and I see no races > possible anymore. > >> Hence, we should add the firmware name to the devres when the firmware >> is found in cache, to make the firmware cacheable next time. >> >> Signed-off-by: Kai-Heng Feng <kai.heng.f...@canonical.com> >> --- >> drivers/base/firmware_class.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c >> index bfbe1e154128..a99de34e3fdc 100644 >> --- a/drivers/base/firmware_class.c >> +++ b/drivers/base/firmware_class.c >> @@ -1177,6 +1177,10 @@ _request_firmware_prepare(struct firmware >> **firmware_p, const char *name, >> >> ret = fw_lookup_and_allocate_buf(name, &fw_cache, &buf, dbuf, size); >> >> + /* device might be a new one, add it to devres list */ >> + if (ret == 0 || ret == 1) >> + fw_add_devm_name(device, name); >> + > > Even if this was correct notice we have requests for which a FW cache is not > desired -- see FW_OPT_NOCACHE, and the above does not respect it. I think this is needed when it's not FW_OPT_NOCACHE. > > Given all the above, can you test with a kernel which has > commit f007cad159e99fa2acd3b2e9364fbb32ad28b971 and tell me if you > still see the issue? It's fixed after f007cad159e99fa2acd3b2e9364fbb32ad28b971. Again, thanks for your detailed explanation. Kai-Heng, > > Luis