Re: [PATCH v2 4/4] IB/srp: Fix a sleep-in-invalid-context bug

2018-01-13 Thread Ming Lei
On Wed, Jan 10, 2018 at 10:18:17AM -0800, Bart Van Assche wrote:
> The previous two patches guarantee that srp_queuecommand() does not get
> invoked while reconnecting occurs. Hence remove the code from
> srp_queuecommand() that prevents command queueing while reconnecting.
> This patch avoids that the following can appear in the kernel log:
> 
> BUG: sleeping function called from invalid context at 
> kernel/locking/mutex.c:747
> in_atomic(): 1, irqs_disabled(): 0, pid: 5600, name: scsi_eh_9
> 1 lock held by scsi_eh_9/5600:
>  #0:  (rcu_read_lock){}, at: [] 
> __blk_mq_run_hw_queue+0xf1/0x1e0
> Preemption disabled at:
> [<139badf2>] __blk_mq_delay_run_hw_queue+0x78/0xf0
> CPU: 9 PID: 5600 Comm: scsi_eh_9 Tainted: GW4.15.0-rc4-dbg+ #1
> Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS 2.5.4 01/22/2016
> Call Trace:
>  dump_stack+0x67/0x99
>  ___might_sleep+0x16a/0x250 [ib_srp]
>  __mutex_lock+0x46/0x9d0
>  srp_queuecommand+0x356/0x420 [ib_srp]
>  scsi_dispatch_cmd+0xf6/0x3f0
>  scsi_queue_rq+0x4a8/0x5f0
>  blk_mq_dispatch_rq_list+0x73/0x440
>  blk_mq_sched_dispatch_requests+0x109/0x1a0
>  __blk_mq_run_hw_queue+0x131/0x1e0
>  __blk_mq_delay_run_hw_queue+0x9a/0xf0
>  blk_mq_run_hw_queue+0xc0/0x1e0
>  blk_mq_start_hw_queues+0x2c/0x40
>  scsi_run_queue+0x18e/0x2d0
>  scsi_run_host_queues+0x22/0x40
>  scsi_error_handler+0x18d/0x5f0
>  kthread+0x11c/0x140
>  ret_from_fork+0x24/0x30
> 
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Hannes Reinecke 
> Cc: Jason Gunthorpe 
> Cc: Doug Ledford 
> ---
>  drivers/infiniband/ulp/srp/ib_srp.c | 21 ++---
>  1 file changed, 2 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> b/drivers/infiniband/ulp/srp/ib_srp.c
> index 972d4b3c5223..670f187ecb91 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -2149,7 +2149,6 @@ static void srp_handle_qp_err(struct ib_cq *cq, struct 
> ib_wc *wc,
>  static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>  {
>   struct srp_target_port *target = host_to_target(shost);
> - struct srp_rport *rport = target->rport;
>   struct srp_rdma_ch *ch;
>   struct srp_request *req;
>   struct srp_iu *iu;
> @@ -2159,16 +2158,6 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
> struct scsi_cmnd *scmnd)
>   u32 tag;
>   u16 idx;
>   int len, ret;
> - const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
> -
> - /*
> -  * The SCSI EH thread is the only context from which srp_queuecommand()
> -  * can get invoked for blocked devices (SDEV_BLOCK /
> -  * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
> -  * locking the rport mutex if invoked from inside the SCSI EH.
> -  */
> - if (in_scsi_eh)
> - mutex_lock(&rport->mutex);

This issue is triggered because that the above EH handler context detection is
wrong since scsi_run_host_queues() from scsi_error_handler() is for
handling normal requests, see the comment below:

/*
 * finally we need to re-initiate requests that may be pending.  we will
 * have had everything blocked while error handling is taking place, and
 * now that error recovery is done, we will need to ensure that these
 * requests are started.
 */
scsi_run_host_queues(shost);

This issue should have been fixed by the following one line change simply:

in_scsi_eh = !!scmd->eh_action;

-- 
Ming


Re: [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device

2018-01-13 Thread Ming Lei
On Fri, Jan 12, 2018 at 10:45:57PM +, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote:
> > > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
> > 
> > Given it is error handling, do we need to prevent the .queuecommand() call
> > in scsi_send_eh_cmnd()? Could you share us what the actual issue
> > observed is from user view?
> 
> Please have a look at the kernel bug report in the description of patch 4/4 of
> this series.

Thanks for your mentioning, then I can find the following comment in
srp_queuecommand():

/*
 * The SCSI EH thread is the only context from which 
srp_queuecommand()
 * can get invoked for blocked devices (SDEV_BLOCK /
 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() 
by
 * locking the rport mutex if invoked from inside the SCSI EH.
 */

That means EH request is allowed to send to blocked device.

I also replied in patch 4/4, looks there is one simple one line change
which should address the issue of 'sleep in atomic context', please discuss that
in patch 4/4 thread.

-- 
Ming


Re: [PATCH v2 00/19] prevent bounds-check bypass via speculative execution

2018-01-13 Thread Linus Torvalds
On Fri, Jan 12, 2018 at 4:15 PM, Tony Luck  wrote:
>
> Here there isn't any reason for speculation. The core has the
> value of 'x' in a register and the upper bound encoded into the
> "cmp" instruction.  Both are right there, no waiting, no speculation.

So this is an argument I haven't seen before (although it was brought
up in private long ago), but that is very relevant: the actual scope
and depth of speculation.

Your argument basically depends on just what gets speculated, and on
the _actual_ order of execution.

So your argument depends on "the uarch will actually run the code in
order if there are no events that block the pipeline".

Or at least it depends on a certain latency of the killing of any OoO
execution being low enough that the cache access doesn't even begin.

I realize that that is very much a particular microarchitectural
detail, but it's actually a *big* deal. Do we have a set of rules for
what is not a worry, simply because the speculated accesses get killed
early enough?

Apparently "test a register value against a constant" is good enough,
assuming that register is also needed for the address of the access.

Linus