At Tue, 12 May 2015 18:18:13 -0700, Laura Abbott wrote: > > (cc-ing linux-usb as well) > > On 05/12/2015 08:14 AM, Marcel Holtmann wrote: > > Hi Laura, > > > >>>> We've received a number of reports of warnings when coming > >>>> out of suspend with certain bluetooth firmware configurations: > >>>> > >>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126 > >>>> _request_firmware+0x558/0x810() > >>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 > >>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter > >>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 > >>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter > >>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 > >>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw > >>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal > >>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek > >>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support > >>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul > >>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo > >>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core > >>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common > >>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core > >>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei > >>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 > >>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core > >>>> video > >>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64 > >>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013 > >>>> Workqueue: hci0 hci_power_on [bluetooth] > >>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215 > >>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a > >>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40 > >>>> Call Trace: > >>>> [<ffffffff8176e215>] dump_stack+0x45/0x57 > >>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0 > >>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20 > >>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810 > >>>> [<ffffffff814dc165>] request_firmware+0x35/0x50 > >>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb] > >>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230 > >>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth] > >>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70 > >>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth] > >>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0 > >>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470 > >>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300 > >>>> [<ffffffff810ba548>] kthread+0xd8/0xf0 > >>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0 > >>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90 > >>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0 > >>>> > >>>> This occurs after every resume. > >>>> > >>>> When resuming, the bluetooth stack calls hci_register_dev, > >>>> allocates a new workqueue, and immediately schedules the > >>>> power_on on the newly created workqueue. Since the new > >>>> workqueue is not freezable, the work runs immediately and > >>>> triggers the warning since resume is still happening and > >>>> usermodehelper has not yet been re-enabled. Fix this by > >>>> making the request workqueue freezable. This ensures > >>>> the work will not run until unfreezing occurs and usermodehelper > >>>> is re-enabled. > >>>> > >>>> Signed-off-by: Laura Abbott <labb...@fedoraproject.org> > >>>> --- > >>>> Resend because I think this got lost in the thread. > >>>> This should be fixing the actual root cause of the warnings. > >>> > >>> so I am not convinced that it actually fixes the root cause. This is just > >>> papering over it. > >>> > >>> The problem is pretty clear, the firmware for some of the Bluetooth > >>> controllers is optional and that means during the original hotplug event > >>> it will not be found and the controller keeps operating. However for some > >>> reason instead of actually suspending and resuming the Bluetooth > >>> controller, we see a unplug + replug (since we are going through probe) > >>> and that is causing this funny behaviour. > >>> > >> > >> Fundamentally the issue is the request_firmware is being called at the > >> wrong time. From Documentation/workqueue.txt: > >> > >> WQ_FREEZABLE > >> > >> A freezable wq participates in the freeze phase of the system > >> suspend operations. Work items on the wq are drained and no > >> new work item starts execution until thawed. > >> > >> > >> By making the request workqueue freezable, any work that gets scheduled > >> will not run until the time for tasks to unthaw. > >> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586 > >> ("PM / sleep: Fix request_firmware() error at resume") fixed the resume > >> path such that before all tasks are unthawed, calls to > >> usermodehelper_read_trylock will block until usermodehelper is fully > >> resumed. This means that any task which is frozen and then woken up > >> again should have the right sequencing for usermodehelper. The workqueue > >> which handled the bluetooth power on was never being frozen properly so > >> there was never any guarantee of when it would run. This patch gives > >> it the necessary sequence. > >> > >>> So how does making one of the core workqueues freezable fixes this the > >>> right way. I do not even know how many other side effects that might > >>> have. That hdev->req_workqueue is a Bluetooth core internal workqueue > >>> that we are using for multiple tasks. > >>> > >>> Rather tell me on why we are probing the USB devices that might need > >>> firmware without having userspace ready. It sounds to me that the USB > >>> driver probe callback should be delayed if we can not guarantee that it > >>> can request firmware. As I explained many times, the call path that > >>> causes this is going through probe callback of the driver itself. > >>> > >> > >> I agree that if the driver probe function was requesting firmware > >> directly there would be a problem. The power_on function is already > >> being called asynchronously on a workqueue. Making that workqueue > >> freezable does exactly the delay you describe. > > > > I am not convinced. Now we are hacking the Bluetooth core layer (which has > > nothing to do with the drivers suspend/resume or probe) to do something > > different so that we do not see this warning. > > > > I can not do anything about the platform in question choosing a > > unplug/replug for suspend/resume instead of having a proper USB suspend and > > resume handling. That is pretty much out of our control. I would rather > > have the USB subsystem delay the probe() callback if we tell it to. Of just > > have request_firmware() actually sleep until userspace is ready. Seriously, > > why is request_firmware not just sleeping for us. > > > > The closest thing to blocking is usermodehelper_read_lock_wait which > waits for a limited amount of time. Takashi Iwai proposed switching > to that unconditionally for all request_firmware but I never saw a > response from the firmware maintainers. I suspect that may not be > acceptable because if the firmware actually needs to block it should > be an asynchronous call. The firmware maintainers can correct me > if I'm incorrect in my understanding.
IIRC, the reason of using usermodehelper_read_trylock() for the normal request_firmware() (not the nowait one) is to check the call of request_firmware() in the resume callback. If the firmware hasn't been cached, it should fail. So, using _trylock() there isn't wrong, per se. It's indeed safer. But, the problem is that _trylock() is used unconditionally for all request_firmware() calls even if it's never from the resume path. Maybe we should allow the f/w loader caller to specify whether to use UMH trylock or wait. The patch below exposes _request_firmware() and FW_OPT_ flags. Then BT driver can call like _request_firmware(&fw, name, dev, FW_OPT_UEVENT | FW_OPT_NO_WARN | FW_OPT_UMH_LOCK_WAIT) Note that the patch is totally untested! Or doesn't this look intuitive enough? Takashi --- diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 171841ad1008..47eb5551c119 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -97,21 +97,6 @@ static inline long firmware_loading_timeout(void) return loading_timeout > 0 ? loading_timeout * HZ : MAX_JIFFY_OFFSET; } -/* firmware behavior options */ -#define FW_OPT_UEVENT (1U << 0) -#define FW_OPT_NOWAIT (1U << 1) -#ifdef CONFIG_FW_LOADER_USER_HELPER -#define FW_OPT_USERHELPER (1U << 2) -#else -#define FW_OPT_USERHELPER 0 -#endif -#ifdef CONFIG_FW_LOADER_USER_HELPER_FALLBACK -#define FW_OPT_FALLBACK FW_OPT_USERHELPER -#else -#define FW_OPT_FALLBACK 0 -#endif -#define FW_OPT_NO_WARN (1U << 3) - struct firmware_cache { /* firmware_buf instance will be added into the below list */ spinlock_t lock; @@ -1085,7 +1070,7 @@ static int assign_firmware_buf(struct firmware *fw, struct device *device, } /* called from request_firmware() and request_firmware_work_func() */ -static int +int _request_firmware(const struct firmware **firmware_p, const char *name, struct device *device, unsigned int opt_flags) { @@ -1099,13 +1084,16 @@ _request_firmware(const struct firmware **firmware_p, const char *name, if (!name || name[0] == '\0') return -EINVAL; + /* Need to pin this module until return */ + __module_get(THIS_MODULE); + ret = _request_firmware_prepare(&fw, name, device); if (ret <= 0) /* error or already assigned */ goto out; ret = 0; timeout = firmware_loading_timeout(); - if (opt_flags & FW_OPT_NOWAIT) { + if (opt_flags & FW_OPT_UMH_LOCK_WAIT) { timeout = usermodehelper_read_lock_wait(timeout); if (!timeout) { dev_dbg(device, "firmware: %s loading timed out\n", @@ -1147,67 +1135,10 @@ _request_firmware(const struct firmware **firmware_p, const char *name, } *firmware_p = fw; - return ret; -} - -/** - * request_firmware: - send firmware request and wait for it - * @firmware_p: pointer to firmware image - * @name: name of firmware file - * @device: device for which firmware is being loaded - * - * @firmware_p will be used to return a firmware image by the name - * of @name for device @device. - * - * Should be called from user context where sleeping is allowed. - * - * @name will be used as $FIRMWARE in the uevent environment and - * should be distinctive enough not to be confused with any other - * firmware image for this or any other device. - * - * Caller must hold the reference count of @device. - * - * The function can be called safely inside device's suspend and - * resume callback. - **/ -int -request_firmware(const struct firmware **firmware_p, const char *name, - struct device *device) -{ - int ret; - - /* Need to pin this module until return */ - __module_get(THIS_MODULE); - ret = _request_firmware(firmware_p, name, device, - FW_OPT_UEVENT | FW_OPT_FALLBACK); - module_put(THIS_MODULE); - return ret; -} -EXPORT_SYMBOL(request_firmware); - -/** - * request_firmware_direct: - load firmware directly without usermode helper - * @firmware_p: pointer to firmware image - * @name: name of firmware file - * @device: device for which firmware is being loaded - * - * This function works pretty much like request_firmware(), but this doesn't - * fall back to usermode helper even if the firmware couldn't be loaded - * directly from fs. Hence it's useful for loading optional firmwares, which - * aren't always present, without extra long timeouts of udev. - **/ -int request_firmware_direct(const struct firmware **firmware_p, - const char *name, struct device *device) -{ - int ret; - - __module_get(THIS_MODULE); - ret = _request_firmware(firmware_p, name, device, - FW_OPT_UEVENT | FW_OPT_NO_WARN); module_put(THIS_MODULE); return ret; } -EXPORT_SYMBOL_GPL(request_firmware_direct); +EXPORT_SYMBOL_GPL(_request_firmware); /** * release_firmware: - release the resource associated with a firmware image @@ -1291,6 +1222,7 @@ request_firmware_nowait( fw_work->context = context; fw_work->cont = cont; fw_work->opt_flags = FW_OPT_NOWAIT | FW_OPT_FALLBACK | + FW_OPT_UMH_LOCK_WAIT | (uevent ? FW_OPT_UEVENT : FW_OPT_USERHELPER); if (!try_module_get(module)) { diff --git a/include/linux/firmware.h b/include/linux/firmware.h index 5c41c5e75b5c..460aa30965cf 100644 --- a/include/linux/firmware.h +++ b/include/linux/firmware.h @@ -39,23 +39,21 @@ struct builtin_fw { __used __section(.builtin_fw) = { name, blob, size } #if defined(CONFIG_FW_LOADER) || (defined(CONFIG_FW_LOADER_MODULE) && defined(MODULE)) -int request_firmware(const struct firmware **fw, const char *name, - struct device *device); +int _request_firmware(const struct firmware **firmware_p, const char *name, + struct device *device, unsigned int opt_flags); int request_firmware_nowait( struct module *module, bool uevent, const char *name, struct device *device, gfp_t gfp, void *context, void (*cont)(const struct firmware *fw, void *context)); -int request_firmware_direct(const struct firmware **fw, const char *name, - struct device *device); - void release_firmware(const struct firmware *fw); #else -static inline int request_firmware(const struct firmware **fw, - const char *name, - struct device *device) +static inline int +_request_firmware(const struct firmware **firmware_p, const char *name, + struct device *device, unsigned int opt_flags) { return -EINVAL; } + static inline int request_firmware_nowait( struct module *module, bool uevent, const char *name, struct device *device, gfp_t gfp, void *context, @@ -67,13 +65,69 @@ static inline int request_firmware_nowait( static inline void release_firmware(const struct firmware *fw) { } +#endif -static inline int request_firmware_direct(const struct firmware **fw, - const char *name, - struct device *device) +/* firmware behavior options */ +#define FW_OPT_UEVENT (1U << 0) +#define FW_OPT_NOWAIT (1U << 1) +#ifdef CONFIG_FW_LOADER_USER_HELPER +#define FW_OPT_USERHELPER (1U << 2) +#else +#define FW_OPT_USERHELPER 0 +#endif +#ifdef CONFIG_FW_LOADER_USER_HELPER_FALLBACK +#define FW_OPT_FALLBACK FW_OPT_USERHELPER +#else +#define FW_OPT_FALLBACK 0 +#endif +#define FW_OPT_NO_WARN (1U << 3) +#define FW_OPT_UMH_LOCK_WAIT (1U << 4) + +/** + * request_firmware: - send firmware request and wait for it + * @firmware_p: pointer to firmware image + * @name: name of firmware file + * @device: device for which firmware is being loaded + * + * @firmware_p will be used to return a firmware image by the name + * of @name for device @device. + * + * Should be called from user context where sleeping is allowed. + * + * @name will be used as $FIRMWARE in the uevent environment and + * should be distinctive enough not to be confused with any other + * firmware image for this or any other device. + * + * Caller must hold the reference count of @device. + * + * The function can be called safely inside device's suspend and + * resume callback. + **/ +static inline int +request_firmware(const struct firmware **firmware_p, const char *name, + struct device *device) { - return -EINVAL; + return _request_firmware(firmware_p, name, device, + FW_OPT_UEVENT | FW_OPT_FALLBACK); +} + +/** + * request_firmware_direct: - load firmware directly without usermode helper + * @firmware_p: pointer to firmware image + * @name: name of firmware file + * @device: device for which firmware is being loaded + * + * This function works pretty much like request_firmware(), but this doesn't + * fall back to usermode helper even if the firmware couldn't be loaded + * directly from fs. Hence it's useful for loading optional firmwares, which + * aren't always present, without extra long timeouts of udev. + **/ +static inline int +request_firmware_direct(const struct firmware **firmware_p, + const char *name, struct device *device) +{ + return _request_firmware(firmware_p, name, device, + FW_OPT_UEVENT | FW_OPT_NO_WARN); } -#endif #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/