Re: Oops when completing request on the wrong queue

2016-08-29 Thread Jens Axboe
On 08/29/2016 12:06 PM, Gabriel Krisman Bertazi wrote: Jens Axboe writes: Can you try this patch? It's not perfect, but I'll be interested if it makes a difference for you. Hi Jens, Sorry for the delay. I just got back to this and have been running your patch on top of 4.8 without a crash

Re: Oops when completing request on the wrong queue

2016-08-29 Thread Gabriel Krisman Bertazi
Jens Axboe writes: >> Can you try this patch? It's not perfect, but I'll be interested if it >> makes a difference for you. > Hi Jens, Sorry for the delay. I just got back to this and have been running your patch on top of 4.8 without a crash for over 1 hour. I wanna give it more time to make

Re: Oops when completing request on the wrong queue

2016-08-24 Thread Jens Axboe
On 08/24/2016 12:34 PM, Jens Axboe wrote: On 08/23/2016 03:14 PM, Jens Axboe wrote: On 08/23/2016 03:11 PM, Jens Axboe wrote: On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote: Gabriel Krisman Bertazi writes: Can you share what you ran to online/offline CPUs? I can't reproduce this here

Re: Oops when completing request on the wrong queue

2016-08-24 Thread Jens Axboe
On 08/23/2016 03:14 PM, Jens Axboe wrote: On 08/23/2016 03:11 PM, Jens Axboe wrote: On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote: Gabriel Krisman Bertazi writes: Can you share what you ran to online/offline CPUs? I can't reproduce this here. I was using the ppc64_cpu tool, which s

Re: Oops when completing request on the wrong queue

2016-08-23 Thread Keith Busch
On Tue, Aug 23, 2016 at 03:14:23PM -0600, Jens Axboe wrote: > On 08/23/2016 03:11 PM, Jens Axboe wrote: > >My workload looks similar to yours, in that it's high depth and with a > >lot of jobs to keep most CPUs loaded. My bash script is different than > >yours, I'll try that and see if it helps her

Re: Oops when completing request on the wrong queue

2016-08-23 Thread Jens Axboe
On 08/23/2016 03:11 PM, Jens Axboe wrote: On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote: Gabriel Krisman Bertazi writes: Can you share what you ran to online/offline CPUs? I can't reproduce this here. I was using the ppc64_cpu tool, which shouldn't do nothing more than write to sysf

Re: Oops when completing request on the wrong queue

2016-08-23 Thread Jens Axboe
On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote: Gabriel Krisman Bertazi writes: Can you share what you ran to online/offline CPUs? I can't reproduce this here. I was using the ppc64_cpu tool, which shouldn't do nothing more than write to sysfs. but I just reproduced it with the scrip

Re: Oops when completing request on the wrong queue

2016-08-23 Thread Gabriel Krisman Bertazi
Gabriel Krisman Bertazi writes: >> Can you share what you ran to online/offline CPUs? I can't reproduce >> this here. > > I was using the ppc64_cpu tool, which shouldn't do nothing more than > write to sysfs. but I just reproduced it with the script below. > > Note that this is ppc64le. I don't

Re: Oops when completing request on the wrong queue

2016-08-19 Thread Gabriel Krisman Bertazi
Jens Axboe writes: >> Some good detective work so far! I agree, this looks like a blk-mq core >> bug. Do you have a trace of a BUG() triggering in nvme_queue_rq(), when >> req->tag != nvmeq->tags? I don't immediately see how this could happen, >> the freezing should protect us from this, unless i

Re: Oops when completing request on the wrong queue

2016-08-19 Thread Jens Axboe
On 08/19/2016 08:13 AM, Jens Axboe wrote: On 08/19/2016 07:28 AM, Gabriel Krisman Bertazi wrote: Gabriel Krisman Bertazi writes: We, IBM, have been experiencing eventual Oops when stressing IO at the same time we add/remove processors. The Oops happens in the IRQ path, when we try to complet

Re: Oops when completing request on the wrong queue

2016-08-19 Thread Jens Axboe
On 08/19/2016 07:28 AM, Gabriel Krisman Bertazi wrote: Gabriel Krisman Bertazi writes: We, IBM, have been experiencing eventual Oops when stressing IO at the same time we add/remove processors. The Oops happens in the IRQ path, when we try to complete a request that was apparently meant for a

Re: Oops when completing request on the wrong queue

2016-08-19 Thread Gabriel Krisman Bertazi
Gabriel Krisman Bertazi writes: > We, IBM, have been experiencing eventual Oops when stressing IO at the > same time we add/remove processors. The Oops happens in the IRQ path, > when we try to complete a request that was apparently meant for another > queue. > > In __nvme_process_cq, the driver