Re: [RESEND][PATCH] Bluetooth: Make request workqueue freezable

Takashi Iwai Tue, 19 May 2015 02:47:23 -0700

At Tue, 12 May 2015 18:18:13 -0700,
Laura Abbott wrote:
> 
> (cc-ing linux-usb as well)
> 
> On 05/12/2015 08:14 AM, Marcel Holtmann wrote:
> > Hi Laura,
> >
> >>>> We've received a number of reports of warnings when coming
> >>>> out of suspend with certain bluetooth firmware configurations:
> >>>>
> >>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
> >>>> _request_firmware+0x558/0x810()
> >>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> >>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
> >>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> >>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> >>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> >>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
> >>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
> >>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
> >>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
> >>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
> >>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
> >>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
> >>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
> >>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
> >>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
> >>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
> >>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
> >>>> video
> >>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
> >>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
> >>>> Workqueue: hci0 hci_power_on [bluetooth]
> >>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
> >>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
> >>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
> >>>> Call Trace:
> >>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
> >>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
> >>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
> >>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
> >>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
> >>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
> >>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
> >>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
> >>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
> >>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
> >>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
> >>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
> >>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
> >>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
> >>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> >>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
> >>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> >>>>
> >>>> This occurs after every resume.
> >>>>
> >>>> When resuming, the bluetooth stack calls hci_register_dev,
> >>>> allocates a new workqueue, and immediately schedules the
> >>>> power_on on the newly created workqueue. Since the new
> >>>> workqueue is not freezable, the work runs immediately and
> >>>> triggers the warning since resume is still happening and
> >>>> usermodehelper has not yet been re-enabled. Fix this by
> >>>> making the request workqueue freezable. This ensures
> >>>> the work will not run until unfreezing occurs and usermodehelper
> >>>> is re-enabled.
> >>>>
> >>>> Signed-off-by: Laura Abbott <labb...@fedoraproject.org>
> >>>> ---
> >>>> Resend because I think this got lost in the thread.
> >>>> This should be fixing the actual root cause of the warnings.
> >>>
> >>> so I am not convinced that it actually fixes the root cause. This is just 
> >>> papering over it.
> >>>
> >>> The problem is pretty clear, the firmware for some of the Bluetooth 
> >>> controllers is optional and that means during the original hotplug event 
> >>> it will not be found and the controller keeps operating. However for some 
> >>> reason instead of actually suspending and resuming the Bluetooth 
> >>> controller, we see a unplug + replug (since we are going through probe) 
> >>> and that is causing this funny behaviour.
> >>>
> >>
> >> Fundamentally the issue is the request_firmware is being called at the
> >> wrong time. From Documentation/workqueue.txt:
> >>
> >>   WQ_FREEZABLE
> >>
> >>         A freezable wq participates in the freeze phase of the system
> >>         suspend operations.  Work items on the wq are drained and no
> >>         new work item starts execution until thawed.
> >>
> >>
> >> By making the request workqueue freezable, any work that gets scheduled
> >> will not run until the time for tasks to unthaw.
> >> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586
> >> ("PM / sleep: Fix request_firmware() error at resume") fixed the resume
> >> path such that before all tasks are unthawed, calls to
> >> usermodehelper_read_trylock will block until usermodehelper is fully
> >> resumed. This means that any task which is frozen and then woken up
> >> again should have the right sequencing for usermodehelper. The workqueue
> >> which handled the bluetooth power on was never being frozen properly so
> >> there was never any guarantee of when it would run. This patch gives
> >> it the necessary sequence.
> >>
> >>> So how does making one of the core workqueues freezable fixes this the 
> >>> right way. I do not even know how many other side effects that might 
> >>> have. That hdev->req_workqueue is a Bluetooth core internal workqueue 
> >>> that we are using for multiple tasks.
> >>>
> >>> Rather tell me on why we are probing the USB devices that might need 
> >>> firmware without having userspace ready. It sounds to me that the USB 
> >>> driver probe callback should be delayed if we can not guarantee that it 
> >>> can request firmware. As I explained many times, the call path that 
> >>> causes this is going through probe callback of the driver itself.
> >>>
> >>
> >> I agree that if the driver probe function was requesting firmware
> >> directly there would be a problem. The power_on function is already
> >> being called asynchronously on a workqueue. Making that workqueue 
> >> freezable does exactly the delay you describe.
> >
> > I am not convinced. Now we are hacking the Bluetooth core layer (which has 
> > nothing to do with the drivers suspend/resume or probe) to do something 
> > different so that we do not see this warning.
> >
> > I can not do anything about the platform in question choosing a 
> > unplug/replug for suspend/resume instead of having a proper USB suspend and 
> > resume handling. That is pretty much out of our control. I would rather 
> > have the USB subsystem delay the probe() callback if we tell it to. Of just 
> > have request_firmware() actually sleep until userspace is ready. Seriously, 
> > why is request_firmware not just sleeping for us.
> >
> 
> The closest thing to blocking is usermodehelper_read_lock_wait which
> waits for a limited amount of time. Takashi Iwai proposed switching
> to that unconditionally for all request_firmware but I never saw a
> response from the firmware maintainers. I suspect that may not be
> acceptable because if the firmware actually needs to block it should
> be an asynchronous call. The firmware maintainers can correct me
> if I'm incorrect in my understanding.


IIRC, the reason of using usermodehelper_read_trylock() for the normal
request_firmware() (not the nowait one) is to check the call of
request_firmware() in the resume callback.  If the firmware hasn't
been cached, it should fail.

So, using _trylock() there isn't wrong, per se.  It's indeed safer.
But, the problem is that _trylock() is used unconditionally for all
request_firmware() calls even if it's never from the resume path.

Maybe we should allow the f/w loader caller to specify whether to use
UMH trylock or wait.  The patch below exposes _request_firmware() and
FW_OPT_ flags.  Then BT driver can call like

        _request_firmware(&fw, name, dev,
                FW_OPT_UEVENT | FW_OPT_NO_WARN | FW_OPT_UMH_LOCK_WAIT)

Note that the patch is totally untested!

Or doesn't this look intuitive enough?


Takashi

---
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 171841ad1008..47eb5551c119 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -97,21 +97,6 @@ static inline long firmware_loading_timeout(void)
        return loading_timeout > 0 ? loading_timeout * HZ : MAX_JIFFY_OFFSET;
 }
 
-/* firmware behavior options */
-#define FW_OPT_UEVENT  (1U << 0)
-#define FW_OPT_NOWAIT  (1U << 1)
-#ifdef CONFIG_FW_LOADER_USER_HELPER
-#define FW_OPT_USERHELPER      (1U << 2)
-#else
-#define FW_OPT_USERHELPER      0
-#endif
-#ifdef CONFIG_FW_LOADER_USER_HELPER_FALLBACK
-#define FW_OPT_FALLBACK                FW_OPT_USERHELPER
-#else
-#define FW_OPT_FALLBACK                0
-#endif
-#define FW_OPT_NO_WARN (1U << 3)
-
 struct firmware_cache {
        /* firmware_buf instance will be added into the below list */
        spinlock_t lock;
@@ -1085,7 +1070,7 @@ static int assign_firmware_buf(struct firmware *fw, 
struct device *device,
 }
 
 /* called from request_firmware() and request_firmware_work_func() */
-static int
+int
 _request_firmware(const struct firmware **firmware_p, const char *name,
                  struct device *device, unsigned int opt_flags)
 {
@@ -1099,13 +1084,16 @@ _request_firmware(const struct firmware **firmware_p, 
const char *name,
        if (!name || name[0] == '\0')
                return -EINVAL;
 
+       /* Need to pin this module until return */
+       __module_get(THIS_MODULE);
+
        ret = _request_firmware_prepare(&fw, name, device);
        if (ret <= 0) /* error or already assigned */
                goto out;
 
        ret = 0;
        timeout = firmware_loading_timeout();
-       if (opt_flags & FW_OPT_NOWAIT) {
+       if (opt_flags & FW_OPT_UMH_LOCK_WAIT) {
                timeout = usermodehelper_read_lock_wait(timeout);
                if (!timeout) {
                        dev_dbg(device, "firmware: %s loading timed out\n",
@@ -1147,67 +1135,10 @@ _request_firmware(const struct firmware **firmware_p, 
const char *name,
        }
 
        *firmware_p = fw;
-       return ret;
-}
-
-/**
- * request_firmware: - send firmware request and wait for it
- * @firmware_p: pointer to firmware image
- * @name: name of firmware file
- * @device: device for which firmware is being loaded
- *
- *      @firmware_p will be used to return a firmware image by the name
- *      of @name for device @device.
- *
- *      Should be called from user context where sleeping is allowed.
- *
- *      @name will be used as $FIRMWARE in the uevent environment and
- *      should be distinctive enough not to be confused with any other
- *      firmware image for this or any other device.
- *
- *     Caller must hold the reference count of @device.
- *
- *     The function can be called safely inside device's suspend and
- *     resume callback.
- **/
-int
-request_firmware(const struct firmware **firmware_p, const char *name,
-                struct device *device)
-{
-       int ret;
-
-       /* Need to pin this module until return */
-       __module_get(THIS_MODULE);
-       ret = _request_firmware(firmware_p, name, device,
-                               FW_OPT_UEVENT | FW_OPT_FALLBACK);
-       module_put(THIS_MODULE);
-       return ret;
-}
-EXPORT_SYMBOL(request_firmware);
-
-/**
- * request_firmware_direct: - load firmware directly without usermode helper
- * @firmware_p: pointer to firmware image
- * @name: name of firmware file
- * @device: device for which firmware is being loaded
- *
- * This function works pretty much like request_firmware(), but this doesn't
- * fall back to usermode helper even if the firmware couldn't be loaded
- * directly from fs.  Hence it's useful for loading optional firmwares, which
- * aren't always present, without extra long timeouts of udev.
- **/
-int request_firmware_direct(const struct firmware **firmware_p,
-                           const char *name, struct device *device)
-{
-       int ret;
-
-       __module_get(THIS_MODULE);
-       ret = _request_firmware(firmware_p, name, device,
-                               FW_OPT_UEVENT | FW_OPT_NO_WARN);
        module_put(THIS_MODULE);
        return ret;
 }
-EXPORT_SYMBOL_GPL(request_firmware_direct);
+EXPORT_SYMBOL_GPL(_request_firmware);
 
 /**
  * release_firmware: - release the resource associated with a firmware image
@@ -1291,6 +1222,7 @@ request_firmware_nowait(
        fw_work->context = context;
        fw_work->cont = cont;
        fw_work->opt_flags = FW_OPT_NOWAIT | FW_OPT_FALLBACK |
+               FW_OPT_UMH_LOCK_WAIT |
                (uevent ? FW_OPT_UEVENT : FW_OPT_USERHELPER);
 
        if (!try_module_get(module)) {
diff --git a/include/linux/firmware.h b/include/linux/firmware.h
index 5c41c5e75b5c..460aa30965cf 100644
--- a/include/linux/firmware.h
+++ b/include/linux/firmware.h
@@ -39,23 +39,21 @@ struct builtin_fw {
        __used __section(.builtin_fw) = { name, blob, size }
 
 #if defined(CONFIG_FW_LOADER) || (defined(CONFIG_FW_LOADER_MODULE) && 
defined(MODULE))
-int request_firmware(const struct firmware **fw, const char *name,
-                    struct device *device);
+int _request_firmware(const struct firmware **firmware_p, const char *name,
+                     struct device *device, unsigned int opt_flags);
 int request_firmware_nowait(
        struct module *module, bool uevent,
        const char *name, struct device *device, gfp_t gfp, void *context,
        void (*cont)(const struct firmware *fw, void *context));
-int request_firmware_direct(const struct firmware **fw, const char *name,
-                           struct device *device);
-
 void release_firmware(const struct firmware *fw);
 #else
-static inline int request_firmware(const struct firmware **fw,
-                                  const char *name,
-                                  struct device *device)
+static inline int
+_request_firmware(const struct firmware **firmware_p, const char *name,
+                 struct device *device, unsigned int opt_flags)
 {
        return -EINVAL;
 }
+
 static inline int request_firmware_nowait(
        struct module *module, bool uevent,
        const char *name, struct device *device, gfp_t gfp, void *context,
@@ -67,13 +65,69 @@ static inline int request_firmware_nowait(
 static inline void release_firmware(const struct firmware *fw)
 {
 }
+#endif
 
-static inline int request_firmware_direct(const struct firmware **fw,
-                                         const char *name,
-                                         struct device *device)
+/* firmware behavior options */
+#define FW_OPT_UEVENT          (1U << 0)
+#define FW_OPT_NOWAIT          (1U << 1)
+#ifdef CONFIG_FW_LOADER_USER_HELPER
+#define FW_OPT_USERHELPER      (1U << 2)
+#else
+#define FW_OPT_USERHELPER      0
+#endif
+#ifdef CONFIG_FW_LOADER_USER_HELPER_FALLBACK
+#define FW_OPT_FALLBACK                FW_OPT_USERHELPER
+#else
+#define FW_OPT_FALLBACK                0
+#endif
+#define FW_OPT_NO_WARN         (1U << 3)
+#define FW_OPT_UMH_LOCK_WAIT   (1U << 4)
+
+/**
+ * request_firmware: - send firmware request and wait for it
+ * @firmware_p: pointer to firmware image
+ * @name: name of firmware file
+ * @device: device for which firmware is being loaded
+ *
+ *      @firmware_p will be used to return a firmware image by the name
+ *      of @name for device @device.
+ *
+ *      Should be called from user context where sleeping is allowed.
+ *
+ *      @name will be used as $FIRMWARE in the uevent environment and
+ *      should be distinctive enough not to be confused with any other
+ *      firmware image for this or any other device.
+ *
+ *     Caller must hold the reference count of @device.
+ *
+ *     The function can be called safely inside device's suspend and
+ *     resume callback.
+ **/
+static inline int
+request_firmware(const struct firmware **firmware_p, const char *name,
+                struct device *device)
 {
-       return -EINVAL;
+       return _request_firmware(firmware_p, name, device,
+                                FW_OPT_UEVENT | FW_OPT_FALLBACK);
+}
+
+/**
+ * request_firmware_direct: - load firmware directly without usermode helper
+ * @firmware_p: pointer to firmware image
+ * @name: name of firmware file
+ * @device: device for which firmware is being loaded
+ *
+ * This function works pretty much like request_firmware(), but this doesn't
+ * fall back to usermode helper even if the firmware couldn't be loaded
+ * directly from fs.  Hence it's useful for loading optional firmwares, which
+ * aren't always present, without extra long timeouts of udev.
+ **/
+static inline int
+request_firmware_direct(const struct firmware **firmware_p,
+                       const char *name, struct device *device)
+{
+       return _request_firmware(firmware_p, name, device,
+                                FW_OPT_UEVENT | FW_OPT_NO_WARN);
 }
 
-#endif
 #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND][PATCH] Bluetooth: Make request workqueue freezable

Reply via email to