[PATCH -next] advansys: Add missing include file

2015-06-05 Thread Guenter Roeck
Fix:

drivers/scsi/advansys.c: In function 'adv_isr_callback':
drivers/scsi/advansys.c:6089:3: error:
implicit declaration of function 'dma_pool_free'
drivers/scsi/advansys.c: In function 'adv_get_sglist':
drivers/scsi/advansys.c:7654:3: error:
implicit declaration of function 'dma_pool_alloc'
drivers/scsi/advansys.c: In function 'advansys_wide_init_chip':
drivers/scsi/advansys.c:10866:2: error:
implicit declaration of function 'dma_pool_create'

Fixes: 0ce538226b7a ("advansys: Use dma_pool for sg elements")
Cc: Hannes Reinecke 
Signed-off-by: Guenter Roeck 
---
 drivers/scsi/advansys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/advansys.c b/drivers/scsi/advansys.c
index 14d3aa54daa5..af2cb5e9e3f7 100644
--- a/drivers/scsi/advansys.c
+++ b/drivers/scsi/advansys.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi: Fix qemu boot hang problem

2014-08-10 Thread Guenter Roeck
The latest kernel fails to boot qemu arm images when using scsi
for disk access. Boot gets stuck after the following messages.

brd: module loaded
sym53c8xx :00:0c.0: enabling device (0100 -> 0103)
sym0: <895a> rev 0x0 at pci :00:0c.0 irq 93
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi host0: sym-2.2.3

Bisect points to commit 71e75c97f97a ("scsi: convert device_busy to
atomic_t"). Code inspection shows the following suspicious change
in scsi_request_fn.

out_delay:
-   if (sdev->device_busy == 0 && !scsi_device_blocked(sdev))
+   if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
}

'sdev->device_busy == 0' was replaced with 'atomic_read(&sdev->device_busy)',
meaning the logic was reversed. Changing this expression to
'!atomic_read(&sdev->device_busy)' fixes the problem.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Webb Scales 
Cc: Jens Axboe 
Signed-off-by: Guenter Roeck 
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9c44392..ce62e87 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue *q)
blk_requeue_request(q, req);
atomic_dec(&sdev->device_busy);
 out_delay:
-   if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
+   if (!atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Fix qemu boot hang problem

2014-08-11 Thread Guenter Roeck

On 08/11/2014 08:33 AM, Venkatesh Srinivas wrote:

Should we wrap the sdev->device_busy read and test in a new
scsi_device_busy(), so as to preserve the old polarity?


I don't see a difference in polarity, or at least replacing '== 0' with '!'
is not a reverse in polarity for me. Anyway, I don't really care if
or how this is wrapped, as long as it is getting fixed.

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] uas: replace WARN_ON_ONCE() with assert_spin_locked().

2014-08-11 Thread Guenter Roeck
On Mon, Aug 11, 2014 at 01:29:26PM +0530, Sanjeev Sharma wrote:
> spin_is_locked() always return false in uniprocessor configuration and 
> therefore it
> would be advise to repalce with assert_spin_locked().
> 
> Signed-off-by: Sanjeev Sharma 
> ---
>  drivers/usb/storage/uas.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
> index 3f42785..8e5877d 100644
> --- a/drivers/usb/storage/uas.c
> +++ b/drivers/usb/storage/uas.c
> @@ -154,7 +154,7 @@ static void uas_mark_cmd_dead(struct uas_dev_info 
> *devinfo,
>   struct scsi_cmnd *cmnd = container_of(scp, struct scsi_cmnd, SCp);
>  
>   uas_log_cmd_state(cmnd, caller);
> - WARN_ON_ONCE(!spin_is_locked(&devinfo->lock));
> + assert_spin_locked(&devinfo->lock);

Seems to me that replacing WARN_ON_ONCE (which may be annoying but only
creates a traceback, and only once) with assert_spin_locked (which
crashes the kernel) is a bit drastic.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] uas: replace WARN_ON_ONCE() with assert_spin_locked().

2014-08-11 Thread Guenter Roeck
On Mon, Aug 11, 2014 at 08:55:07PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 08/11/2014 08:19 PM, Guenter Roeck wrote:
> > On Mon, Aug 11, 2014 at 01:29:26PM +0530, Sanjeev Sharma wrote:
> >> spin_is_locked() always return false in uniprocessor configuration and 
> >> therefore it
> >> would be advise to repalce with assert_spin_locked().
> >>
> >> Signed-off-by: Sanjeev Sharma 
> >> ---
> >>  drivers/usb/storage/uas.c | 8 
> >>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
> >> index 3f42785..8e5877d 100644
> >> --- a/drivers/usb/storage/uas.c
> >> +++ b/drivers/usb/storage/uas.c
> >> @@ -154,7 +154,7 @@ static void uas_mark_cmd_dead(struct uas_dev_info 
> >> *devinfo,
> >>struct scsi_cmnd *cmnd = container_of(scp, struct scsi_cmnd, SCp);
> >>  
> >>uas_log_cmd_state(cmnd, caller);
> >> -  WARN_ON_ONCE(!spin_is_locked(&devinfo->lock));
> >> +  assert_spin_locked(&devinfo->lock);
> > 
> > Seems to me that replacing WARN_ON_ONCE (which may be annoying but only
> > creates a traceback, and only once) with assert_spin_locked (which
> > crashes the kernel) is a bit drastic.
> 
> I can see your point, but so far these paranoia checks have never triggered,
> and having them trigger _always_ one some uni-processor (which is the reason
> for this patch) to me seems the worse problem of the 2.
> 
If those are just paranoia checks, it might make sense to use
lockdep_assert_held() to reduce runtime overhead if lockdep
debugging is disabled.

> Ideally we would have a warn_spin_not_locked or such ...
> 
There is WARN_ON_SMP, which might have been a better choice if the _ONCE isn't
that important (which one should think if it is ok to crash the kernel if the
problem is seen).

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] uas: replace WARN_ON_ONCE() with lockdep_assert_held()

2014-08-11 Thread Guenter Roeck
On Tue, Aug 12, 2014 at 02:01:53PM +0800, Greg KH wrote:
> On Tue, Aug 12, 2014 at 11:38:37AM +0530, Sanjeev Sharma wrote:
> > spin_is_locked() always return false in uniprocessor configuration and 
> > therefore it
> > would be advise to replace with lockdep_assert_held().
> 
> Add "on some architectures" in here somewhere, as it's not broken on the
> large majority of UP cpus :)
> 
FWIW, it is confirmed broken on mips (32 and 64 bit), ppc, and sparc64.
I have not tested on x86. Might be worth trying. arm64 seems to be ok,
unless I did something wrong in my test, as well as at least some of
the architectures which don't support smp to start with (such as sparc32
or microblaze).

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Fix qemu boot hang problem

2014-08-15 Thread Guenter Roeck

ping ... the problem fixed by this patch still affects the upstream kernel
(v3.16-11383-gc9d2642) as well as -next (20140815).

Guenter

On 08/10/2014 05:54 AM, Guenter Roeck wrote:

The latest kernel fails to boot qemu arm images when using scsi
for disk access. Boot gets stuck after the following messages.

brd: module loaded
sym53c8xx :00:0c.0: enabling device (0100 -> 0103)
sym0: <895a> rev 0x0 at pci :00:0c.0 irq 93
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi host0: sym-2.2.3

Bisect points to commit 71e75c97f97a ("scsi: convert device_busy to
atomic_t"). Code inspection shows the following suspicious change
in scsi_request_fn.

out_delay:
-   if (sdev->device_busy == 0 && !scsi_device_blocked(sdev))
+   if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
}

'sdev->device_busy == 0' was replaced with 'atomic_read(&sdev->device_busy)',
meaning the logic was reversed. Changing this expression to
'!atomic_read(&sdev->device_busy)' fixes the problem.

Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Webb Scales 
Cc: Jens Axboe 
Signed-off-by: Guenter Roeck 
---
  drivers/scsi/scsi_lib.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9c44392..ce62e87 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue *q)
blk_requeue_request(q, req);
atomic_dec(&sdev->device_busy);
  out_delay:
-   if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
+   if (!atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
  }




--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Fix qemu boot hang problem

2014-08-15 Thread Guenter Roeck

On 08/15/2014 12:22 PM, Christoph Hellwig wrote:

On Fri, Aug 15, 2014 at 12:34:22PM -0600, Jens Axboe wrote:

On 2014-08-15 12:22, Guenter Roeck wrote:

ping ... the problem fixed by this patch still affects the upstream kernel
(v3.16-11383-gc9d2642) as well as -next (20140815).


James, could you upstream this one in time for -rc1?


It's in the core-for-3.17 updates I sent to James yesterday, just needs to
be forwarded..



Great, thanks a lot!

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] scsi: two SG_CHAIN related fixes

2019-06-04 Thread Guenter Roeck
On Tue, Jun 04, 2019 at 04:23:06PM +0800, Ming Lei wrote:
> Hi,
> 
> Guenter reported scsi boot issue caused by commit c3288dd8c232
> ("scsi: core: avoid pre-allocating big SGL for data").
> 
> Turns out there are at least two issues.
> 
> The 1st patch fixes issue in case that NO_SG_CHAIN on some ARCHs,
> such as alpha, arm and parisc.
> 
> The 2nd patch makes esp scsi working with SG_CHAIN.
> 

Both patches applied on top of next-20190604.

Results on alpha:

Waiting for root device /dev/sda...
[ cut here ]
WARNING: CPU: 0 PID: 7 at mm/slab.h:359 kmem_cache_free+0x120/0x2a0
virt_to_cache: Object is not a Slab page!
Modules linked in:
CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 
5.2.0-rc3-next-20190604-2-gab1fef861ee2 #1
   fc000787fb38 fc0007d12658 fc333d50 fc4205b0
   fc333dd8 fc0007eb82a8 fc3c3234 fceedb20
   0080 0001 0167 fc4205b0
   fc4205b0 fc0007d908f8 fccbe231 fc000787fbf8
   0018  fc6d8e38 fc0007d908f8
    fcb8d610 0001 005a
Trace:
[] __warn+0x160/0x190
[] kmem_cache_free+0x120/0x2a0
[] warn_slowpath_fmt+0x58/0x70
[] mempool_free_slab+0x24/0x40
[] kmem_cache_free+0x120/0x2a0
[] kmem_cache_free+0x120/0x2a0
[] bio_free+0x98/0xc0
[] mempool_free_slab+0x24/0x40
[] mempool_free+0x48/0x100
[] sg_pool_free+0x9c/0xd0
[] sg_pool_free+0x0/0xd0
[] __sg_free_table+0xb0/0xf0
[] sg_free_table_chained+0x48/0x60
[] scsi_mq_uninit_cmd+0xc0/0xd0
[] scsi_end_request+0x94/0x270
[] scsi_io_completion+0xd0/0x740
[] scsi_finish_command+0x104/0x160
[] scsi_finish_command+0x40/0x160
[] scsi_softirq_done+0x18c/0x1c0
[] blk_done_softirq+0xbc/0xf0
[] run_ksoftirqd+0x48/0x80
[] smpboot_thread_fn+0x184/0x230
[] kthread+0x168/0x1d0
[] smpboot_thread_fn+0x0/0x230
[] ret_from_kernel_thread+0x18/0x20
[] kthread+0x0/0x1d0

---[ end trace 1fadf5f1c983cdae ]---
scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK2.5+ PQ: 0 ANSI: 5
sd 0:0:0:0: Power-on or device reset occurred
sd 0:0:0:0: [sda] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB)
random: crng init done
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:0:0: [sda] Attached SCSI disk
EXT4-fs (sda): mounted filesystem without journal. Opts: (null)
VFS: Mounted root (ext4 filesystem) readonly on device 8:0.
Unable to handle kernel paging request at virtual address 
kworker/0:1H(45): Oops 1
pc = []  ra = []  ps = Tainted: G   
 W
pc is at loop+0x0/0x10
ra is at scsi_queue_rq+0x6fc/0xaa0
v0 =   t0 = 0060  t1 = 0004
t2 = 000c  t3 = 98010028  t4 = 
t5 = 0060  t6 = fc0007eb83d4  t7 = fc0007ed
s0 =   s1 = fc0007eb82c0  s2 = fc0007d72998
s3 = fc0007dd2b28  s4 = fc0007eb83d0  s5 = fc000782a328
s6 = fc000782a328
a0 =   a1 =   a2 = 
a3 =   a4 =   a5 = fc0007edbeb0
t8 = fc0007eb83d3  t9 = 0002  t10= fc0007d12068
t11=   pv = fcb7f6e0  at = 
gp = fcee88d8  sp = f0901f3d
Disabling lock debugging due to kernel taint
Trace:
[] blk_mq_dispatch_rq_list+0x454/0x810
[] blk_mq_dispatch_rq_list+0x3dc/0x810
[] blk_mq_do_dispatch_sched+0xc4/0x170
[] blk_mq_do_dispatch_sched+0x60/0x170
[] blk_mq_do_dispatch_sched+0x90/0x170
[] blk_mq_sched_dispatch_requests+0x140/0x220
[] __blk_mq_run_hw_queue+0x6c/0x250
[] process_one_work+0x1f0/0x520
[] worker_thread+0x60/0x750
[] kthread+0x168/0x1d0
[] worker_thread+0x0/0x750
[] ret_from_kernel_thread+0x18/0x20
[] kthread+0x0/0x1d0

On the plus side, boot tests test are passing with m68k and sparc,
so "scsi: esp: make it working on SG_CHAIN" seems to work.

Guenter


Re: [PATCH 2/2] scsi: esp: make it working on SG_CHAIN

2019-06-04 Thread Guenter Roeck
On Tue, Jun 04, 2019 at 04:23:08PM +0800, Ming Lei wrote:
> The driver supporses that there isn't sg chain, and itereate the
> list one by one. This way is obviously wrong.
> 
> Fixes it by sgl helper.
> 
> Cc: Christoph Hellwig 
> Cc: Bart Van Assche 
> Cc: Ewan D. Milne 
> Cc: Hannes Reinecke 
> Cc: Guenter Roeck 
> Reported-by: Guenter Roeck 
> Signed-off-by: Ming Lei 

Tested-by: Guenter Roeck 

> ---
>  drivers/scsi/esp_scsi.c | 32 +---
>  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/esp_scsi.c b/drivers/scsi/esp_scsi.c
> index 76e7ca864d6a..58b4e059dcfb 100644
> --- a/drivers/scsi/esp_scsi.c
> +++ b/drivers/scsi/esp_scsi.c
> @@ -371,6 +371,7 @@ static void esp_map_dma(struct esp *esp, struct scsi_cmnd 
> *cmd)
>   struct esp_cmd_priv *spriv = ESP_CMD_PRIV(cmd);
>   struct scatterlist *sg = scsi_sglist(cmd);
>   int total = 0, i;
> + struct scatterlist *sgt;
>  
>   if (cmd->sc_data_direction == DMA_NONE)
>   return;
> @@ -381,14 +382,15 @@ static void esp_map_dma(struct esp *esp, struct 
> scsi_cmnd *cmd)
>* a dma address, so perform an identity mapping.
>*/
>   spriv->num_sg = scsi_sg_count(cmd);
> - for (i = 0; i < spriv->num_sg; i++) {
> - sg[i].dma_address = (uintptr_t)sg_virt(&sg[i]);
> - total += sg_dma_len(&sg[i]);
> +
> + scsi_for_each_sg(cmd, sgt, spriv->num_sg, i) {
> + sgt->dma_address = (uintptr_t)sg_virt(sgt);
> + total += sg_dma_len(sgt);
>   }
>   } else {
>   spriv->num_sg = scsi_dma_map(cmd);
> - for (i = 0; i < spriv->num_sg; i++)
> - total += sg_dma_len(&sg[i]);
> + scsi_for_each_sg(cmd, sgt, spriv->num_sg, i)
> + total += sg_dma_len(sgt);
>   }
>   spriv->cur_residue = sg_dma_len(sg);
>   spriv->cur_sg = sg;
> @@ -444,7 +446,7 @@ static void esp_advance_dma(struct esp *esp, struct 
> esp_cmd_entry *ent,
>   p->tot_residue = 0;
>   }
>   if (!p->cur_residue && p->tot_residue) {
> - p->cur_sg++;
> + p->cur_sg = sg_next(p->cur_sg);
>   p->cur_residue = sg_dma_len(p->cur_sg);
>   }
>  }
> @@ -1610,6 +1612,22 @@ static void esp_msgin_extended(struct esp *esp)
>   scsi_esp_cmd(esp, ESP_CMD_SATN);
>  }
>  
> +static struct scatterlist *esp_sg_prev(struct scsi_cmnd *cmd,
> + struct scatterlist *sg)
> +{
> + int i;
> + struct scatterlist *tmp;
> + struct scatterlist *prev = NULL;
> +
> + scsi_for_each_sg(cmd, tmp, scsi_sg_count(cmd), i) {
> + if (tmp == sg)
> + break;
> + prev = tmp;
> + }
> +
> + return prev;
> +}
> +
>  /* Analyze msgin bytes received from target so far.  Return non-zero
>   * if there are more bytes needed to complete the message.
>   */
> @@ -1647,7 +1665,7 @@ static int esp_msgin_process(struct esp *esp)
>   spriv = ESP_CMD_PRIV(ent->cmd);
>  
>   if (spriv->cur_residue == sg_dma_len(spriv->cur_sg)) {
> - spriv->cur_sg--;
> + spriv->cur_sg = esp_sg_prev(ent->cmd, spriv->cur_sg);
>   spriv->cur_residue = 1;
>   } else
>   spriv->cur_residue++;
> -- 
> 2.20.1
> 


Re: [PATCH 1/2] scsi: core: don't pre-allocate small SGL in case of NO_SG_CHAIN

2019-06-04 Thread Guenter Roeck
On Tue, Jun 04, 2019 at 04:23:07PM +0800, Ming Lei wrote:
> The pre-allocated small SGL depends on SG_CHAIN, so if the ARCH doesn't
> support SG_CHAIN, pre-allocation of small SGL can't work at all.
> 
> Fixes this issue by not using small pre-allocation in case of
> NO_SG_CHAIN.
> 
> Cc: Christoph Hellwig 
> Cc: Bart Van Assche 
> Cc: Ewan D. Milne 
> Cc: Hannes Reinecke 
> Cc: Guenter Roeck 
> Fixes: c3288dd8c232 ("scsi: core: avoid pre-allocating big SGL for data")
> Reported-by: Guenter Roeck 
> Signed-off-by: Ming Lei 

alpha images still crash with this patch applied (see response to
patch 0/2) for traceback).

Guenter


Re: [PATCH 0/2] scsi: two SG_CHAIN related fixes

2019-06-04 Thread Guenter Roeck
On Tue, Jun 04, 2019 at 11:18:44PM +0800, Ming Lei wrote:
> Hi Guenter,
> 
> Thanks for your test and report!
> 
> On Tue, Jun 04, 2019 at 06:38:49AM -0700, Guenter Roeck wrote:
> > On Tue, Jun 04, 2019 at 04:23:06PM +0800, Ming Lei wrote:
> > > Hi,
> > > 
> > > Guenter reported scsi boot issue caused by commit c3288dd8c232
> > > ("scsi: core: avoid pre-allocating big SGL for data").
> > > 
> > > Turns out there are at least two issues.
> > > 
> > > The 1st patch fixes issue in case that NO_SG_CHAIN on some ARCHs,
> > > such as alpha, arm and parisc.
> > > 
> > > The 2nd patch makes esp scsi working with SG_CHAIN.
> > > 
> > 
> > Both patches applied on top of next-20190604.
> > 
> > Results on alpha:
> > 
> > Waiting for root device /dev/sda...
> > [ cut here ]
> > WARNING: CPU: 0 PID: 7 at mm/slab.h:359 kmem_cache_free+0x120/0x2a0
> > virt_to_cache: Object is not a Slab page!
> 
> Please apply the following patch against the posted two:
> 
> diff --git a/lib/sg_pool.c b/lib/sg_pool.c
> index 47eecbe094d8..e042a1722615 100644
> --- a/lib/sg_pool.c
> +++ b/lib/sg_pool.c
> @@ -122,7 +122,7 @@ int sg_alloc_table_chained(struct sg_table *table, int 
> nents,
>   }
>  
>   /* User supposes that the 1st SGL includes real entry */
> - if (nents_first_chunk == 1) {
> + if (nents_first_chunk <= 1) {
>   first_chunk = NULL;
>   nents_first_chunk = 0;
>   }
> 

That fixes the problem.

Thanks,
Guenter


Re: [PATCH V2 0/3] scsi: three SG_CHAIN related fixes

2019-06-05 Thread Guenter Roeck
On Wed, Jun 05, 2019 at 09:06:20AM +0800, Ming Lei wrote:
> Hi,
> 
> Guenter reported scsi boot issue caused by commit c3288dd8c232
> ("scsi: core: avoid pre-allocating big SGL for data").
> 
> Turns out there are at least three issues.
> 
> The 1st patch fixes sg_alloc_table_chained() which may try to use
> the pre-allocation SGL even though user passes zero to 'nents_first_chunk'.
> 
> The 2nd patch fixes issue in case that NO_SG_CHAIN on some ARCHs,
> such as alpha, arm and parisc.
> 
> The 3rd patch makes esp scsi working with SG_CHAIN.
> 
> V2:
>   - add the patch1, which is verified by Guenter
>   - add .prv_sg to store the previous sg for esp_scsi
> 
> Ming Lei (3):
>   scsi: lib/sg_pool.c: clear 'first_chunk' in case of no pre-allocation
>   scsi: core: don't pre-allocate small SGL in case of NO_SG_CHAIN
>   scsi: esp: make it working on SG_CHAIN
> 

Running my tests on next-20190605, I get:

Qemu test results:
total: 349 pass: 296 fail: 53

The same tests on next-20190605 plus this series results in:

Qemu test results:
total: 349 pass: 347 fail: 2
Failed tests: 
sh:rts7751r2dplus_defconfig:usb:rootfs
sh:rts7751r2dplus_defconfig:usb-hub:rootfs

The remaining failures are consistent across re-runs. The failure is only
seen when booting from usb using the sm501-usb controller (see below).

usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 08 7c 00 00 f0 00
print_req_error: I/O error, dev sda, sector 2172 flags 80700
usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 01 da 00 00 f0 00
print_req_error: I/O error, dev sda, sector 474 flags 84700
usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 02 da 00 00 f0 00
print_req_error: I/O error, dev sda, sector 730 flags 84700
usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 0b 50 00 00 f0 00
print_req_error: I/O error, dev sda, sector 2896 flags 84700

Presumably that means that either the sm501-usb emulation has a subtle bug
associated with SG, or something is wrong with the sm501-usb driver.

The qemu command line in the failure case is:

qemu-system-sh4 -M r2d \
-kernel ./arch/sh/boot/zImage \
-snapshot \
-usb -device usb-storage,drive=d0 \
-drive file=rootfs.ext2,if=none,id=d0,format=raw \
-append 'panic=-1 slub_debug=FZPUA root=/dev/sda rootwait 
console=ttySC1,115200 earlycon=scif,mmio16,0xffe8 noiotrap' \
-serial null -serial stdio \
-net nic,model=rtl8139 -net user -nographic -monitor null

I'll be happy to provide kernel configuration and a pointer to the root file
system if needed.

Thanks,
Guenter


Re: [PATCH V2 0/3] scsi: three SG_CHAIN related fixes

2019-06-05 Thread Guenter Roeck
On Wed, Jun 05, 2019 at 09:25:37AM -0700, Guenter Roeck wrote:
> On Wed, Jun 05, 2019 at 09:06:20AM +0800, Ming Lei wrote:
> > Hi,
> > 
> > Guenter reported scsi boot issue caused by commit c3288dd8c232
> > ("scsi: core: avoid pre-allocating big SGL for data").
> > 
> > Turns out there are at least three issues.
> > 
> > The 1st patch fixes sg_alloc_table_chained() which may try to use
> > the pre-allocation SGL even though user passes zero to 'nents_first_chunk'.
> > 
> > The 2nd patch fixes issue in case that NO_SG_CHAIN on some ARCHs,
> > such as alpha, arm and parisc.
> > 
> > The 3rd patch makes esp scsi working with SG_CHAIN.
> > 
> > V2:
> > - add the patch1, which is verified by Guenter
> > - add .prv_sg to store the previous sg for esp_scsi
> > 
> > Ming Lei (3):
> >   scsi: lib/sg_pool.c: clear 'first_chunk' in case of no pre-allocation
> >   scsi: core: don't pre-allocate small SGL in case of NO_SG_CHAIN
> >   scsi: esp: make it working on SG_CHAIN
> > 
> 
> Running my tests on next-20190605, I get:
> 
> Qemu test results:
>   total: 349 pass: 296 fail: 53
> 
> The same tests on next-20190605 plus this series results in:
> 
> Qemu test results:
>   total: 349 pass: 347 fail: 2
> Failed tests: 
>   sh:rts7751r2dplus_defconfig:usb:rootfs
>   sh:rts7751r2dplus_defconfig:usb-hub:rootfs
> 
> The remaining failures are consistent across re-runs. The failure is only
> seen when booting from usb using the sm501-usb controller (see below).
> 
> usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 08 7c 00 00 f0 00
> print_req_error: I/O error, dev sda, sector 2172 flags 80700
> usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 01 da 00 00 f0 00
> print_req_error: I/O error, dev sda, sector 474 flags 84700
> usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 02 da 00 00 f0 00
> print_req_error: I/O error, dev sda, sector 730 flags 84700
> usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
> sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
> sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 0b 50 00 00 f0 00
> print_req_error: I/O error, dev sda, sector 2896 flags 84700
> 
> Presumably that means that either the sm501-usb emulation has a subtle bug
> associated with SG, or something is wrong with the sm501-usb driver.
> 

Turns out the above is caused by other (unrelated) patches in the sm501
USB driver. After reverting those, everything is fine. Sorry for the noise.
Please feel free to add

Tested-by: Guenter Roeck 

to all three patches.

Thanks,
Guenter

> The qemu command line in the failure case is:
> 
> qemu-system-sh4 -M r2d \
>   -kernel ./arch/sh/boot/zImage \
>   -snapshot \
>   -usb -device usb-storage,drive=d0 \
>   -drive file=rootfs.ext2,if=none,id=d0,format=raw \
>   -append 'panic=-1 slub_debug=FZPUA root=/dev/sda rootwait 
> console=ttySC1,115200 earlycon=scif,mmio16,0xffe8 noiotrap' \
>   -serial null -serial stdio \
>   -net nic,model=rtl8139 -net user -nographic -monitor null
> 
> I'll be happy to provide kernel configuration and a pointer to the root file
> system if needed.
> 
> Thanks,
> Guenter


Re: [PATCH 15/22] hwmon: applesmc: fix format string overflow

2017-07-14 Thread Guenter Roeck

On 07/14/2017 05:07 AM, Arnd Bergmann wrote:

gcc-7 warns that the key might exceed five bytes for lage index
values:

drivers/hwmon/applesmc.c: In function 'applesmc_show_fan_position':
drivers/hwmon/applesmc.c:906:18: error: '%d' directive writing between 1 and 5 
bytes into a region of size 4 [-Werror=format-overflow=]
sprintf(newkey, FAN_ID_FMT, to_index(attr));
 ^~~
drivers/hwmon/applesmc.c:906:18: note: directive argument in the range [0, 
65535]
drivers/hwmon/applesmc.c:906:2: note: 'sprintf' output between 5 and 9 bytes 
into a destination of size 5

As the key is required to be four characters plus trailing zero,
we know that the index has to be small here. I'm using snprintf()
to avoid the warning. This would truncate the string instead of
overflowing.

Signed-off-by: Arnd Bergmann 


I submitted a more comprehensive patch a couple of days ago. There are other 
similar
sprintf() calls in the driver which gcc doesn't report.

Guenter


---
  drivers/hwmon/applesmc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 0af7fd311979..515163b9a89f 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -903,7 +903,7 @@ static ssize_t applesmc_show_fan_position(struct device 
*dev,
char newkey[5];
u8 buffer[17];
  
-	sprintf(newkey, FAN_ID_FMT, to_index(attr));

+   snprintf(newkey, sizeof(newkey), FAN_ID_FMT, to_index(attr));
  
  	ret = applesmc_read_key(newkey, buffer, 16);

buffer[16] = 0;





Re: [PATCH v7] scsi: Add hwmon support for SMART temperature sensors

2018-11-22 Thread Guenter Roeck

[-cc]

Hi Linus,

On 11/22/18 5:49 AM, Linus Walleij wrote:
[ ... ]

(It's
also worth noting that HDD temperature sensors are notoriously
unreliable).


I am sorry if you think that D-Link does bad engineering, what I
am trying to achieve is upstream support for this device, without
any out-of-tree patches. The D-Link DIR-685 uses the harddisk
sensor for this, whether we like it or not.



Following the above argument (not yours, the one about accuracy),
I guess we should drop support for all CPU temperature sensors
from the Linux kernel. After all, they are known to be much more
inaccurate than a HDD temperature sensor could ever be.

Seriously, any argument about (lack of) sensor accuracy should be
silently ignored. I may be overly pessimistic nowadays, but whenever
I see such an argument, I read it as "we'll never accept your patch,
so better stop wasting your time and forget about it". I hope I am
wrong, but it seems to me that this is where things are going.

Can you possibly extract this as pure hwmon driver outside scsi control ?
I'll be happy to accept it as standalone hwmon driver.

Thanks,
Guenter


Re: [PATCH v7] scsi: Add hwmon support for SMART temperature sensors

2018-11-23 Thread Guenter Roeck

On 11/23/18 12:18 AM, Linus Walleij wrote:

On Thu, Nov 22, 2018 at 9:00 PM Guenter Roeck  wrote:


Can you possibly extract this as pure hwmon driver outside scsi control ?
I'll be happy to accept it as standalone hwmon driver.


That should be last resort, what the SCSI people want is noble,
and they did a tremendous (impressive) work by hiding all the ATA drives
behind SCSI emulation with libata, so they want me to keep up
that tradition by also making the temperature reading behave
"as if it was a SCSI drive" too so I'm on board with trying that out
even if I think the bar is a bit high for causal contributors.



No problem, your call. I would just very much dislike your work
to get lost, that is all.

Thanks,
Guenter


kobject lifetime issues in blk-mq

2018-11-09 Thread Guenter Roeck
Hi,

Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
enabled report kobject lifetime issues when booting an image in qemu
using virtio-scsi-pci.

Bisect points to two different commits, depending on the kernel
configuration.

1st configuration: defconfig plus:

CONFIG_VIRTIO_BLK=y
CONFIG_SCSI_LOWLEVEL=y
CONFIG_SCSI_VIRTIO=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_MMIO=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_KOBJECT=y
CONFIG_DEBUG_KOBJECT_RELEASE=y

Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
virtio_scsi: fix IO hang caused by automatic irq vector affinity")
as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
the bisect log is therefore a bit misleading.

2nd configuration: As above, plus:
CONFIG_SCSI_MQ_DEFAULT=y

With this configuration, bisect points to commit 7ea5fe31c12d
("blk-mq: make lifetime consitent between q/ctx and its kobject").

The qemu command line used to reproduce the problem is as follows.
The qemu version does not seem to matter (I tested with qemu versions
2.5, 2.11, and 3.0).

qemu-system-x86_64 \
-kernel arch/x86/boot/bzImage \
-device virtio-scsi-pci,id=scsi \
-device scsi-hd,bus=scsi.0,drive=d0 \
-drive file=./wheezy.img,format=raw,if=none,id=d0 \
-snapshot \
-m 2G -smp 4 -enable-kvm \
-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
-nographic -monitor none \
-no-reboot \
-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 
panic=-1"

Root file system:
https://storage.googleapis.com/syzkaller/wheezy.img
or:

https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz

It seems that any root file system can be used to reproduce the problem.
Also, the problem is not limited to virtio-scsi-pci. I also tried
am53c974, lsi53c810, and lsi53c895a (after enabling the respective
drivers), with the same result.

Overall, this suggests that the problem is related to blk-mq and was
indeed introduced with commit 7ea5fe31c12d.

For reference, I attached an actual crash log as well as a set of bisect
logs below.

Unfortunately, my understanding of blk-mq is not good enough to find the
underlying problem and suggest a fix. Please let me know if there is
anything else I can do to help fixing the problem.

Note that the problem was originally reported by syzbot running a test
on chromeos-4.14. It may well be that the problem has already been
reported and is being worked on. If so, please apologize the noise.

Thanks,
Guenter

---
[8.700038] [ cut here ]
[8.700376] ODEBUG: free active (active state 0) object type: timer_list 
hint: delayed_work_timer_fn+0x0/0x10
[8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 
debug_print_object+0x6a/0x80
[8.701032] Kernel panic - not syncing: panic_on_warn set ...
[8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
[8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[8.701032] Call Trace:
[8.701032]  
[8.701032]  dump_stack+0x46/0x5b
[8.701032]  panic+0xf3/0x246
[8.701032]  ? debug_print_object+0x6a/0x80
[8.701032]  __warn+0xeb/0xf0
[8.701032]  ? debug_print_object+0x6a/0x80
[8.701032]  ? debug_print_object+0x6a/0x80
[8.701032]  report_bug+0xb1/0x120
[8.701032]  fixup_bug.part.10+0x13/0x30
[8.701032]  do_error_trap+0x8f/0xb0
[8.701032]  do_invalid_op+0x31/0x40
[8.701032]  ? debug_print_object+0x6a/0x80
[8.701032]  invalid_op+0x14/0x20
[8.701032] RIP: 0010:debug_print_object+0x6a/0x80
[8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
[8.701032] RSP: 0018:89b9fda03e48 EFLAGS: 00010082
[8.701032] RAX:  RBX: 89b9fd49d0c8 RCX: a6646e18
[8.701032] RDX: 0001 RSI: 0082 RDI: a6cc596c
[8.701032] RBP: a664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
[8.701032] R10: 89b9fda03f10 R11: 3178302f3078302b R12: a6403e22
[8.701032] R13: 0001 R14: a6d4d458 R15: 89b9fc4c8780
[8.701032]  ? debug_print_object+0x6a/0x80
[8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
[8.701032]  ? rcu_process_callbacks+0x2a7/0x750
[8.701032]  kmem_cache_free+0x6e/0x1a0
[8.701032]  ? __blk_release_queue+0x150/0x150
[8.701032]  rcu_process_callbacks+0x2a7/0x750
[8.701032]  __do_softirq+0xf2/0x2c7
[8.701032]  irq_exit+0xb7/0xc0
[8.701032]  smp_apic_timer_interrupt+0x67/0x130
[8.701032]  apic_timer_interrupt+0xf/0x20
[8.701032]  
[8.701032] RIP: 0010:default_idle+0x12/0x140
[8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 

Re: kobject lifetime issues in blk-mq

2018-11-12 Thread Guenter Roeck
On Mon, Nov 12, 2018 at 05:20:52PM +0800, Ming Lei wrote:
> On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > Hi,
> > 
> > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > enabled report kobject lifetime issues when booting an image in qemu
> > using virtio-scsi-pci.
> > 
> > Bisect points to two different commits, depending on the kernel
> > configuration.
> > 
> > 1st configuration: defconfig plus:
> > 
> > CONFIG_VIRTIO_BLK=y
> > CONFIG_SCSI_LOWLEVEL=y
> > CONFIG_SCSI_VIRTIO=y
> > CONFIG_VIRTIO_PCI=y
> > CONFIG_VIRTIO_MMIO=y
> > CONFIG_DEBUG_OBJECTS=y
> > CONFIG_DEBUG_OBJECTS_FREE=y
> > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > CONFIG_DEBUG_KOBJECT=y
> > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > 
> > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > the bisect log is therefore a bit misleading.
> > 
> > 2nd configuration: As above, plus:
> > CONFIG_SCSI_MQ_DEFAULT=y
> > 
> > With this configuration, bisect points to commit 7ea5fe31c12d
> > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > 
> > The qemu command line used to reproduce the problem is as follows.
> > The qemu version does not seem to matter (I tested with qemu versions
> > 2.5, 2.11, and 3.0).
> > 
> > qemu-system-x86_64 \
> > -kernel arch/x86/boot/bzImage \
> > -device virtio-scsi-pci,id=scsi \
> > -device scsi-hd,bus=scsi.0,drive=d0 \
> > -drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > -snapshot \
> > -m 2G -smp 4 -enable-kvm \
> > -net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > -nographic -monitor none \
> > -no-reboot \
> > -append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 
> > panic=-1"
> > 
> > Root file system:
> > https://storage.googleapis.com/syzkaller/wheezy.img
> > or:
> > 
> > https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > 
> > It seems that any root file system can be used to reproduce the problem.
> > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > drivers), with the same result.
> > 
> > Overall, this suggests that the problem is related to blk-mq and was
> > indeed introduced with commit 7ea5fe31c12d.
> > 
> > For reference, I attached an actual crash log as well as a set of bisect
> > logs below.
> > 
> > Unfortunately, my understanding of blk-mq is not good enough to find the
> > underlying problem and suggest a fix. Please let me know if there is
> > anything else I can do to help fixing the problem.
> > 
> > Note that the problem was originally reported by syzbot running a test
> > on chromeos-4.14. It may well be that the problem has already been
> > reported and is being worked on. If so, please apologize the noise.
> > 
> > Thanks,
> > Guenter
> > 
> > ---
> > [8.700038] [ cut here ]
> > [8.700376] ODEBUG: free active (active state 0) object type: timer_list 
> > hint: delayed_work_timer_fn+0x0/0x10
> > [8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 
> > debug_print_object+0x6a/0x80
> > [8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > [8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > [8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > [8.701032] Call Trace:
> > [8.701032]  
> > [8.701032]  dump_stack+0x46/0x5b
> > [8.701032]  panic+0xf3/0x246
> > [8.701032]  ? debug_print_object+0x6a/0x80
> > [8.701032]  __warn+0xeb/0xf0
> > [8.701032]  ? debug_print_object+0x6a/0x80
> > [8.701032]  ? debug_print_object+0x6a/0x80
> > [8.701032]  report_bug+0xb1/0x120
> > [8.701032]  fixup_bug.part.10+0x13/0x30
> > [8.701032]  do_error_trap+0x8f/0xb0
> > [8.701032]  do_invalid_op+0x31/0x40
> > [8.701032]  ? debug_print_object+0x6a/0x80
> > [8.701032]  invalid_op+0x14/0x20
> > [8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > [8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 
> > 00 4c
> > 89 e

Re: kobject lifetime issues in blk-mq

2018-11-12 Thread Guenter Roeck
On Mon, Nov 12, 2018 at 05:44:07PM +0800, Ming Lei wrote:
> On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> > On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > enabled report kobject lifetime issues when booting an image in qemu
> > > using virtio-scsi-pci.
> > > 
> > > Bisect points to two different commits, depending on the kernel
> > > configuration.
> > > 
> > > 1st configuration: defconfig plus:
> > > 
> > > CONFIG_VIRTIO_BLK=y
> > > CONFIG_SCSI_LOWLEVEL=y
> > > CONFIG_SCSI_VIRTIO=y
> > > CONFIG_VIRTIO_PCI=y
> > > CONFIG_VIRTIO_MMIO=y
> > > CONFIG_DEBUG_OBJECTS=y
> > > CONFIG_DEBUG_OBJECTS_FREE=y
> > > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > > CONFIG_DEBUG_KOBJECT=y
> > > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > 
> > > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > > the bisect log is therefore a bit misleading.
> > > 
> > > 2nd configuration: As above, plus:
> > > CONFIG_SCSI_MQ_DEFAULT=y
> > > 
> > > With this configuration, bisect points to commit 7ea5fe31c12d
> > > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > > 
> > > The qemu command line used to reproduce the problem is as follows.
> > > The qemu version does not seem to matter (I tested with qemu versions
> > > 2.5, 2.11, and 3.0).
> > > 
> > > qemu-system-x86_64 \
> > >   -kernel arch/x86/boot/bzImage \
> > >   -device virtio-scsi-pci,id=scsi \
> > >   -device scsi-hd,bus=scsi.0,drive=d0 \
> > >   -drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > >   -snapshot \
> > >   -m 2G -smp 4 -enable-kvm \
> > >   -net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > >   -nographic -monitor none \
> > >   -no-reboot \
> > >   -append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 
> > > panic=-1"
> > > 
> > > Root file system:
> > >   https://storage.googleapis.com/syzkaller/wheezy.img
> > > or:
> > >   
> > > https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > > 
> > > It seems that any root file system can be used to reproduce the problem.
> > > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > > drivers), with the same result.
> > > 
> > > Overall, this suggests that the problem is related to blk-mq and was
> > > indeed introduced with commit 7ea5fe31c12d.
> > > 
> > > For reference, I attached an actual crash log as well as a set of bisect
> > > logs below.
> > > 
> > > Unfortunately, my understanding of blk-mq is not good enough to find the
> > > underlying problem and suggest a fix. Please let me know if there is
> > > anything else I can do to help fixing the problem.
> > > 
> > > Note that the problem was originally reported by syzbot running a test
> > > on chromeos-4.14. It may well be that the problem has already been
> > > reported and is being worked on. If so, please apologize the noise.
> > > 
> > > Thanks,
> > > Guenter
> > > 
> > > ---
> > > [8.700038] [ cut here ]
> > > [8.700376] ODEBUG: free active (active state 0) object type: 
> > > timer_list hint: delayed_work_timer_fn+0x0/0x10
> > > [8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 
> > > debug_print_object+0x6a/0x80
> > > [8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > > [8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > > [8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > [8.701032] Call Trace:
> > > [8.701032]  
> > > [8.701032]  dump_stack+0x46/0x5b
> > > [8.701032]  panic+0xf3/0x246
> > > [8.701032]  ? debug_print_object+0x6a/0x80
> > > [8.701032]  __warn+0xeb/0xf0
> > > [8.701032]  ? debug_print_object+0x6a/0x80
> > > [8.701

Re: BUG in scsi_lib.c due to a bad commit

2014-11-11 Thread Guenter Roeck

On 11/11/2014 04:17 PM, Bjorn Helgaas wrote:

[+cc Guenter, linux-scsi]

On Tue, Nov 11, 2014 at 4:33 PM, Barto  wrote:

Hello everyone,

I notice a bug since kernel 3.17 ( and also with 3.18 branch ), a random
hang at boot on some PC configurations, I did a "git bisect" and I found
that the culprit is  :

[045065d8a300a37218c548e9aa7becd581c6a0e8] [SCSI] fix qemu boot hang problem

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=045065d8a300a37218c548e9aa7becd581c6a0e8

the author of this commit has choosen to inverse the logic of the if
statement in the file  drivers/scsi/scsi_lib.c in order to solve an
issue with qemu :

--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue *q)
blk_requeue_request(q, req);
atomic_dec(&sdev->device_busy);
out_delay:
- if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
+ if (!atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
}



The above commit was a follow-up to commit 71e75c97f97a, which did the 
following:

-   if (sdev->device_busy == 0 && !scsi_device_blocked(sdev))
+   if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))

meaning it reversed the polarity of the check in the original code.
Since that commit created problems as observed in qemu, I thought it
would be obvious that restoring the original polarity would make sense.
This is explained in my patch, so I am somewhat at loss why ...


this change triggers a bug on my PC ( I don't have SCSI devices, but
only 3 SATA harddisks and 2 IDE harddisks, SATA disks are on an ICH7
sata controler on the motherboard gigabyte GA-P31-DS3L, and IDE disk on
a JMicron JMB363/368 Sata/IDE PCIe card ), every 5~10 boots the boot
stops suddenly because of this commit,

If I revert this commit then the bug is gone, more details can be found
here, where I created a patch who reverts commit 045065d8 :

https://bugzilla.kernel.org/show_bug.cgi?id=87581

my question: why Guenter Roeck ( the author of the bad commit ) has
choosen to inverse the logic in the if statement ?


this question is asked.

Having said that, I don't really care one way or another. I am fine
with reverting my commit; if that is done, I'll simply remove the
related scsi tests from my qemu test runs.

Guenter


before his commit the if statement was like this :

if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);

if his decision to inverse the logic of
"atomic_read(&sdev->device_busy)" is acceptable then the real bug is
probably located elsewhere in the scsi source code, and we must solve
this mistery because there is obviously a bug regression in SCSI code
because with older kernels ( 3.16.7 and lower ) I don't have the random
hang boot bug with my configuration,

another user in archlinux forums has also this bug and he has a more
modern PC ( intel i7 core cpu, SSD device ), my fear is when linux
distros will move to kernel 3.17 then more users will have this weird
random bug who can occur only on boot and only with a specific PC
configuration, if the boot step is passed despite the random bug then
the bug will not occur, it occurs only during the boot process, which
probably means that the faulty source code is only called during the
boot process,

thanks for anyone who wants to dig this problem with me
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cxlflash: Select IRQ_POLL

2017-05-05 Thread Guenter Roeck
The driver now uses IRQ_POLL and needs to select it to avoid the following
build error.

ERROR: ".irq_poll_complete" [drivers/scsi/cxlflash/cxlflash.ko] undefined!
ERROR: ".irq_poll_sched" [drivers/scsi/cxlflash/cxlflash.ko] undefined!
ERROR: ".irq_poll_disable" [drivers/scsi/cxlflash/cxlflash.ko] undefined!
ERROR: ".irq_poll_init" [drivers/scsi/cxlflash/cxlflash.ko] undefined!

Fixes: cba06e6de403 ("scsi: cxlflash: Implement IRQ polling for RRQ processing")
Signed-off-by: Guenter Roeck 
---
 drivers/scsi/cxlflash/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/cxlflash/Kconfig b/drivers/scsi/cxlflash/Kconfig
index c052104e523e..a011c5dbf214 100644
--- a/drivers/scsi/cxlflash/Kconfig
+++ b/drivers/scsi/cxlflash/Kconfig
@@ -5,6 +5,7 @@
 config CXLFLASH
tristate "Support for IBM CAPI Flash"
depends on PCI && SCSI && CXL && EEH
+   select IRQ_POLL
default m
help
  Allows CAPI Accelerated IO to Flash
-- 
2.7.4



Re: [PATCH 09/13] timer: Remove users of expire and data arguments to DEFINE_TIMER

2017-10-04 Thread Guenter Roeck

On 10/04/2017 04:27 PM, Kees Cook wrote:

The expire and data arguments of DEFINE_TIMER are only used in two places
and are ignored by the code (malta-display.c only uses mod_timer(),
never add_timer(), so the preset expires value is ignored). Set both
sets of arguments to zero.

Cc: Ralf Baechle 
Cc: Wim Van Sebroeck 
Cc: Guenter Roeck 
Cc: Geert Uytterhoeven 
Cc: linux-m...@linux-mips.org
Cc: linux-watch...@vger.kernel.org
Signed-off-by: Kees Cook 


For watchdog:

Acked-by: Guenter Roeck 


---
  arch/mips/mti-malta/malta-display.c | 6 +++---
  drivers/watchdog/alim7101_wdt.c | 4 ++--
  2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/mips/mti-malta/malta-display.c 
b/arch/mips/mti-malta/malta-display.c
index d4f807191ecd..ac813158b9b8 100644
--- a/arch/mips/mti-malta/malta-display.c
+++ b/arch/mips/mti-malta/malta-display.c
@@ -36,10 +36,10 @@ void mips_display_message(const char *str)
}
  }
  
-static void scroll_display_message(unsigned long data);

-static DEFINE_TIMER(mips_scroll_timer, scroll_display_message, HZ, 0);
+static void scroll_display_message(unsigned long unused);
+static DEFINE_TIMER(mips_scroll_timer, scroll_display_message, 0, 0);
  
-static void scroll_display_message(unsigned long data)

+static void scroll_display_message(unsigned long unused)
  {
mips_display_message(&display_string[display_count++]);
if (display_count == max_display_count)
diff --git a/drivers/watchdog/alim7101_wdt.c b/drivers/watchdog/alim7101_wdt.c
index 665e0e7dfe1e..3c1f6ac68ea9 100644
--- a/drivers/watchdog/alim7101_wdt.c
+++ b/drivers/watchdog/alim7101_wdt.c
@@ -71,7 +71,7 @@ MODULE_PARM_DESC(use_gpio,
"Use the gpio watchdog (required by old cobalt boards).");
  
  static void wdt_timer_ping(unsigned long);

-static DEFINE_TIMER(timer, wdt_timer_ping, 0, 1);
+static DEFINE_TIMER(timer, wdt_timer_ping, 0, 0);
  static unsigned long next_heartbeat;
  static unsigned long wdt_is_open;
  static char wdt_expect_close;
@@ -87,7 +87,7 @@ MODULE_PARM_DESC(nowayout,
   *Whack the dog
   */
  
-static void wdt_timer_ping(unsigned long data)

+static void wdt_timer_ping(unsigned long unused)
  {
/* If we got a heartbeat pulse within the WDT_US_INTERVAL
 * we agree to ping the WDT





Re: [PATCH 10/13] timer: Remove expires and data arguments from DEFINE_TIMER

2017-10-04 Thread Guenter Roeck

On 10/04/2017 04:27 PM, Kees Cook wrote:

Drop the arguments from the macro and adjust all callers with the
following script:

   perl -pi -e 's/DEFINE_TIMER\((.*), 0, 0\);/DEFINE_TIMER($1);/g;' \
 $(git grep DEFINE_TIMER | cut -d: -f1 | sort -u | grep -v timer.h)

Signed-off-by: Kees Cook 
Acked-by: Geert Uytterhoeven  # for m68k parts


For watchdog:

Acked-by: Guenter Roeck 


---
  arch/arm/mach-ixp4xx/dsmg600-setup.c  | 2 +-
  arch/arm/mach-ixp4xx/nas100d-setup.c  | 2 +-
  arch/m68k/amiga/amisound.c| 2 +-
  arch/m68k/mac/macboing.c  | 2 +-
  arch/mips/mti-malta/malta-display.c   | 2 +-
  arch/parisc/kernel/pdc_cons.c | 2 +-
  arch/s390/mm/cmm.c| 2 +-
  drivers/atm/idt77105.c| 4 ++--
  drivers/atm/iphase.c  | 2 +-
  drivers/block/ataflop.c   | 8 
  drivers/char/dtlk.c   | 2 +-
  drivers/char/hangcheck-timer.c| 2 +-
  drivers/char/nwbutton.c   | 2 +-
  drivers/char/rtc.c| 2 +-
  drivers/input/touchscreen/s3c2410_ts.c| 2 +-
  drivers/net/cris/eth_v10.c| 6 +++---
  drivers/net/hamradio/yam.c| 2 +-
  drivers/net/wireless/atmel/at76c50x-usb.c | 2 +-
  drivers/staging/speakup/main.c| 2 +-
  drivers/staging/speakup/synth.c   | 2 +-
  drivers/tty/cyclades.c| 2 +-
  drivers/tty/isicom.c  | 2 +-
  drivers/tty/moxa.c| 2 +-
  drivers/tty/rocket.c  | 2 +-
  drivers/tty/vt/keyboard.c | 2 +-
  drivers/tty/vt/vt.c   | 2 +-
  drivers/watchdog/alim7101_wdt.c   | 2 +-
  drivers/watchdog/machzwd.c| 2 +-
  drivers/watchdog/mixcomwd.c   | 2 +-
  drivers/watchdog/sbc60xxwdt.c | 2 +-
  drivers/watchdog/sc520_wdt.c  | 2 +-
  drivers/watchdog/via_wdt.c| 2 +-
  drivers/watchdog/w83877f_wdt.c| 2 +-
  drivers/xen/grant-table.c | 2 +-
  fs/pstore/platform.c  | 2 +-
  include/linux/timer.h | 4 ++--
  kernel/irq/spurious.c | 2 +-
  lib/random32.c| 2 +-
  net/atm/mpc.c | 2 +-
  net/decnet/dn_route.c | 2 +-
  net/ipv6/ip6_flowlabel.c  | 2 +-
  net/netrom/nr_loopback.c  | 2 +-
  security/keys/gc.c| 2 +-
  sound/oss/midibuf.c   | 2 +-
  sound/oss/soundcard.c | 2 +-
  sound/oss/sys_timer.c | 2 +-
  sound/oss/uart6850.c  | 2 +-
  47 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/arch/arm/mach-ixp4xx/dsmg600-setup.c 
b/arch/arm/mach-ixp4xx/dsmg600-setup.c
index b3bd0e137f6d..b3689a141ec6 100644
--- a/arch/arm/mach-ixp4xx/dsmg600-setup.c
+++ b/arch/arm/mach-ixp4xx/dsmg600-setup.c
@@ -174,7 +174,7 @@ static int power_button_countdown;
  #define PBUTTON_HOLDDOWN_COUNT 4 /* 2 secs */
  
  static void dsmg600_power_handler(unsigned long data);

-static DEFINE_TIMER(dsmg600_power_timer, dsmg600_power_handler, 0, 0);
+static DEFINE_TIMER(dsmg600_power_timer, dsmg600_power_handler);
  
  static void dsmg600_power_handler(unsigned long data)

  {
diff --git a/arch/arm/mach-ixp4xx/nas100d-setup.c 
b/arch/arm/mach-ixp4xx/nas100d-setup.c
index 4e0f762bc651..562d05f9888e 100644
--- a/arch/arm/mach-ixp4xx/nas100d-setup.c
+++ b/arch/arm/mach-ixp4xx/nas100d-setup.c
@@ -197,7 +197,7 @@ static int power_button_countdown;
  #define PBUTTON_HOLDDOWN_COUNT 4 /* 2 secs */
  
  static void nas100d_power_handler(unsigned long data);

-static DEFINE_TIMER(nas100d_power_timer, nas100d_power_handler, 0, 0);
+static DEFINE_TIMER(nas100d_power_timer, nas100d_power_handler);
  
  static void nas100d_power_handler(unsigned long data)

  {
diff --git a/arch/m68k/amiga/amisound.c b/arch/m68k/amiga/amisound.c
index 90a60d758f8b..a23f48181fd6 100644
--- a/arch/m68k/amiga/amisound.c
+++ b/arch/m68k/amiga/amisound.c
@@ -66,7 +66,7 @@ void __init amiga_init_sound(void)
  }
  
  static void nosound( unsigned long ignored );

-static DEFINE_TIMER(sound_timer, nosound, 0, 0);
+static DEFINE_TIMER(sound_timer, nosound);
  
  void amiga_mksound( unsigned int hz, unsigned int ticks )

  {
diff --git a/arch/m68k/mac/macboing.c b/arch/m68k/mac/macboing.c
index ffaa1f6439ae..9a52aff183d0 100644
--- a/arch/m68k/mac/macboing.c
+++ b/arch/m68k/mac/macboing.c
@@ -56,7 +56,7 @@ static void ( *mac_special_bell )( unsigned int, unsigned 
int, unsigned int );
  /*
   * our timer to start/continue/stop the bell
   */
-static DEFINE_TIMER(mac_sound_timer, mac_nosound, 0, 0);
+static DEFINE_TIMER(mac_sound_timer, mac_nosound);
  
  /*

   * Sort of initialize the sound chip (called from mac_mksound on the first