On Tue, Apr 16, 2019 at 08:44:10PM -0700, Ming Lei wrote:
> Hennes reported the following kernel oops:
> 
>     There is a race condition between namespace rescanning and
>     controller reset; during controller reset all namespaces are
>     quiesed vie nams_stop_ctrl(), and after reset all namespaces
>     are unquiesced again.
>     When namespace scanning was active by the time controller reset
>     was triggered the rescan code will call nvme_ns_remove(), which
>     then will cause a kernel crash in nvme_start_ctrl() as it'll trip
>     over uninitialized namespaces.
> 
> Patch "blk-mq: free hw queue's resource in hctx's release handler"
> should make this issue quite difficult to trigger. However it can't
> kill the issue completely becasue pre-condition of that patch is to
> hold request queue's refcount before calling block layer API, and
> there is still a small window between blk_cleanup_queue() and removing
> the ns from the controller namspace list in nvme_ns_remove().
> 
> Hold request queue's refcount until the ns is freed, then the above race
> can be avoided completely. Given the 'namespaces_rwsem' is always held
> to retrieve ns for starting/stopping request queue, this lock can prevent
> namespaces from being freed.

This looks good to me.

Reviewed-by: Keith Busch <keith.bu...@intel.com>

Reply via email to