subject:"\[PATCH 1\/4\] softirq\: implement IRQ flood detection mechanism"

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-23 Thread Long Li

>Thanks for the clarification. > >The problem with what Ming is proposing in my mind (and its an existing >problem that exists today), is that nvme is taking precedence over anything >else until it absolutely cannot hog the cpu in hardirq. > >In the thread Ming referenced a case where today if the

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Sagi Grimberg

Sagi, Sorry it took a while to bring my system back online. With the patch, the IOPS is about the same drop with the 1st patch. I think the excessive context switches are causing the drop in IOPS. The following are captured by "perf sched record" for 30 seconds during tests. "perf sched

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Long Li

> >> Long, does this patch make any difference? > > > > Sagi, > > > > Sorry it took a while to bring my system back online. > > > > With the patch, the IOPS is about the same drop with the 1st patch. I think > the excessive context switches are causing the drop in IOPS. > > > > The following are ca

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Sagi Grimberg

Hey Ming, Ok, so the real problem is per-cpu bounded tasks. I share Thomas opinion about a NAPI like approach. We already have that, its irq_poll, but it seems that for this use-case, we get lower performance for some reason. I'm not entirely sure why that is, maybe its because we need to

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Sagi Grimberg

It seems like we're attempting to stay in irq context for as long as we can instead of scheduling to softirq/thread context if we have more than a minimal amount of work to do. Without at least understanding why softirq/thread degrades us so much this code seems like the wrong approach to me. I

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-18 Thread Ming Lei

On Mon, Sep 09, 2019 at 08:10:07PM -0700, Sagi Grimberg wrote: > Hey Ming, > > > > > Ok, so the real problem is per-cpu bounded tasks. > > > > > > > > I share Thomas opinion about a NAPI like approach. > > > > > > We already have that, its irq_poll, but it seems that for this > > > use-case, we

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-17 Thread Long Li

>Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > >Hey Ming, > >>>> Ok, so the real problem is per-cpu bounded tasks. >>>> >>>> I share Thomas opinion about a NAPI like approach. >>> >>> We already have t

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-09 Thread Sagi Grimberg

Hey Ming, Ok, so the real problem is per-cpu bounded tasks. I share Thomas opinion about a NAPI like approach. We already have that, its irq_poll, but it seems that for this use-case, we get lower performance for some reason. I'm not entirely sure why that is, maybe its because we need to mas

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-09 Thread Ming Lei

On Sat, Sep 07, 2019 at 06:19:20AM +0800, Ming Lei wrote: > On Fri, Sep 06, 2019 at 05:50:49PM +, Long Li wrote: > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > > > > >On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote: >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Ming Lei

On Fri, Sep 06, 2019 at 11:30:57AM -0700, Sagi Grimberg wrote: > > > > > Ok, so the real problem is per-cpu bounded tasks. > > > > I share Thomas opinion about a NAPI like approach. > > We already have that, its irq_poll, but it seems that for this > use-case, we get lower performance for some

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Ming Lei

On Fri, Sep 06, 2019 at 04:25:55PM -0600, Keith Busch wrote: > On Sat, Sep 07, 2019 at 06:19:21AM +0800, Ming Lei wrote: > > On Fri, Sep 06, 2019 at 05:50:49PM +, Long Li wrote: > > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Keith Busch

On Sat, Sep 07, 2019 at 06:19:21AM +0800, Ming Lei wrote: > On Fri, Sep 06, 2019 at 05:50:49PM +, Long Li wrote: > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > > > > >Why are all 8 nvmes sharing the same CPU for inte

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Ming Lei

On Fri, Sep 06, 2019 at 05:50:49PM +, Long Li wrote: > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > > >On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote: > >> When one IRQ flood happens on one CPU: > >> > >&g

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Keith Busch

On Fri, Sep 06, 2019 at 11:30:57AM -0700, Sagi Grimberg wrote: > > > > > Ok, so the real problem is per-cpu bounded tasks. > > > > I share Thomas opinion about a NAPI like approach. > > We already have that, its irq_poll, but it seems that for this > use-case, we get lower performance for some

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Sagi Grimberg

Ok, so the real problem is per-cpu bounded tasks. I share Thomas opinion about a NAPI like approach. We already have that, its irq_poll, but it seems that for this use-case, we get lower performance for some reason. I'm not entirely sure why that is, maybe its because we need to mask interr

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Long Li

>Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > >On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote: >> When one IRQ flood happens on one CPU: >> >> 1) softirq handling on this CPU can't make progress >> >> 2) kernel

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Keith Busch

On Fri, Sep 06, 2019 at 09:48:21AM +0800, Ming Lei wrote: > When one IRQ flood happens on one CPU: > > 1) softirq handling on this CPU can't make progress > > 2) kernel thread bound to this CPU can't make progress > > For example, network may require softirq to xmit packets, or another irq > thr

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Daniel Lezcano

Hi, On 06/09/2019 03:48, Ming Lei wrote: [ ... ] >> You did not share yet the analysis of the problem (the kernel warnings >> give the symptoms) and gave the reasoning for the solution. It is hard >> to understand what you are looking for exactly and how to connect the dots. > > Let me explai

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Long Li

>Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > >On 06/09/2019 03:22, Long Li wrote: >[ ... ] >> > >> Tracing shows that the CPU was in either hardirq or softirq all the >> time before warnings. During tests, the system was un

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Daniel Lezcano

On 06/09/2019 03:22, Long Li wrote: [ ... ] > > Tracing shows that the CPU was in either hardirq or softirq all the > time before warnings. During tests, the system was unresponsive at > times. > > Ming's patch fixed this problem. The system was responsive throughout > tests. > > As for perfo

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Ming Lei

Hi Daniel, On Thu, Sep 05, 2019 at 12:37:13PM +0200, Daniel Lezcano wrote: > > Hi Ming, > > On 05/09/2019 11:06, Ming Lei wrote: > > On Wed, Sep 04, 2019 at 07:31:48PM +0200, Daniel Lezcano wrote: > >> Hi, > >> > >> On 04/09/2019 19:07, Bart Van Assche wrote: > >>> On 9/3/19 12:50 AM, Daniel Lez

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Long Li

>Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism > > >Hi Ming, > >On 05/09/2019 11:06, Ming Lei wrote: >> On Wed, Sep 04, 2019 at 07:31:48PM +0200, Daniel Lezcano wrote: >>> Hi, >>> >>> On 04/09/2019 19:07, Bart Van As

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Daniel Lezcano

Hi Ming, On 05/09/2019 11:06, Ming Lei wrote: > On Wed, Sep 04, 2019 at 07:31:48PM +0200, Daniel Lezcano wrote: >> Hi, >> >> On 04/09/2019 19:07, Bart Van Assche wrote: >>> On 9/3/19 12:50 AM, Daniel Lezcano wrote: On 03/09/2019 09:28, Ming Lei wrote: > On Tue, Sep 03, 2019 at 08:40:35A

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Ming Lei

On Wed, Sep 04, 2019 at 12:47:13PM -0700, Bart Van Assche wrote: > On 9/4/19 11:02 AM, Peter Zijlstra wrote: > > On Wed, Sep 04, 2019 at 10:38:59AM -0700, Bart Van Assche wrote: > > > I think it is widely known that rdtsc is a relatively slow x86 > > > instruction. > > > So I expect that using tha

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-05 Thread Ming Lei

On Wed, Sep 04, 2019 at 07:31:48PM +0200, Daniel Lezcano wrote: > Hi, > > On 04/09/2019 19:07, Bart Van Assche wrote: > > On 9/3/19 12:50 AM, Daniel Lezcano wrote: > >> On 03/09/2019 09:28, Ming Lei wrote: > >>> On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: > It is a schedul

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Bart Van Assche

On 9/4/19 11:02 AM, Peter Zijlstra wrote: On Wed, Sep 04, 2019 at 10:38:59AM -0700, Bart Van Assche wrote: I think it is widely known that rdtsc is a relatively slow x86 instruction. So I expect that using that instruction will cause a measurable overhead if it is called frequently enough. I'm n

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Peter Zijlstra

On Wed, Sep 04, 2019 at 10:38:59AM -0700, Bart Van Assche wrote: > On 9/4/19 10:31 AM, Daniel Lezcano wrote: > > On 04/09/2019 19:07, Bart Van Assche wrote: > > > Only if CONFIG_IRQ_TIME_ACCOUNTING has been enabled. However, I don't > > > know any Linux distro that enables that option. That's proba

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Bart Van Assche

On 9/4/19 10:31 AM, Daniel Lezcano wrote: On 04/09/2019 19:07, Bart Van Assche wrote: Only if CONFIG_IRQ_TIME_ACCOUNTING has been enabled. However, I don't know any Linux distro that enables that option. That's probably because that option introduces two rdtsc() calls in each interrupt. Given th

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Daniel Lezcano

Hi, On 04/09/2019 19:07, Bart Van Assche wrote: > On 9/3/19 12:50 AM, Daniel Lezcano wrote: >> On 03/09/2019 09:28, Ming Lei wrote: >>> On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: It is a scheduler problem then ? >>> >>> Scheduler can do nothing if the CPU is taken complet

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-04 Thread Bart Van Assche

On 9/3/19 12:50 AM, Daniel Lezcano wrote: On 03/09/2019 09:28, Ming Lei wrote: On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: It is a scheduler problem then ? Scheduler can do nothing if the CPU is taken completely by handling interrupt & softirq, so seems not a scheduler pro

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Ming Lei

On Tue, Sep 03, 2019 at 09:50:06AM +0200, Daniel Lezcano wrote: > On 03/09/2019 09:28, Ming Lei wrote: > > On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: > >> On 03/09/2019 08:31, Ming Lei wrote: > >>> Hi Daniel, > >>> > >>> On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Ming Lei

On Tue, Sep 03, 2019 at 10:09:57AM +0200, Thomas Gleixner wrote: > On Tue, 3 Sep 2019, Ming Lei wrote: > > Scheduler can do nothing if the CPU is taken completely by handling > > interrupt & softirq, so seems not a scheduler problem, IMO. > > Well, but thinking more about it, the solution you are

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Thomas Gleixner

On Tue, 3 Sep 2019, Ming Lei wrote: > Scheduler can do nothing if the CPU is taken completely by handling > interrupt & softirq, so seems not a scheduler problem, IMO. Well, but thinking more about it, the solution you are proposing is more a bandaid than anything else. If you look at the network

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Daniel Lezcano

On 03/09/2019 09:28, Ming Lei wrote: > On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: >> On 03/09/2019 08:31, Ming Lei wrote: >>> Hi Daniel, >>> >>> On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote: Hi Ming Lei, On 03/09/2019 05:30, Ming Lei wrote:

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-03 Thread Ming Lei

On Tue, Sep 03, 2019 at 08:40:35AM +0200, Daniel Lezcano wrote: > On 03/09/2019 08:31, Ming Lei wrote: > > Hi Daniel, > > > > On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote: > >> > >> Hi Ming Lei, > >> > >> On 03/09/2019 05:30, Ming Lei wrote: > >> > >> [ ... ] > >> > >> > > 2)

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-02 Thread Daniel Lezcano

On 03/09/2019 08:31, Ming Lei wrote: > Hi Daniel, > > On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote: >> >> Hi Ming Lei, >> >> On 03/09/2019 05:30, Ming Lei wrote: >> >> [ ... ] >> >> > 2) irq/timing doesn't cover softirq That's solvable, right? >>> >>> Yeah, we can e

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-02 Thread Ming Lei

Hi Daniel, On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote: > > Hi Ming Lei, > > On 03/09/2019 05:30, Ming Lei wrote: > > [ ... ] > > > >>> 2) irq/timing doesn't cover softirq > >> > >> That's solvable, right? > > > > Yeah, we can extend irq/timing, but ugly for irq/timing, si

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-02 Thread Daniel Lezcano

Hi Ming Lei, On 03/09/2019 05:30, Ming Lei wrote: [ ... ] >>> 2) irq/timing doesn't cover softirq >> >> That's solvable, right? > > Yeah, we can extend irq/timing, but ugly for irq/timing, since irq/timing > focuses on hardirq predication, and softirq isn't involved in that > purpose. > >>

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-02 Thread Ming Lei

On Wed, Aug 28, 2019 at 04:07:19PM +0200, Thomas Gleixner wrote: > On Wed, 28 Aug 2019, Ming Lei wrote: > > On Wed, Aug 28, 2019 at 01:23:06PM +0200, Thomas Gleixner wrote: > > > On Wed, 28 Aug 2019, Ming Lei wrote: > > > > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > > > > >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-29 Thread Ming Lei

On Thu, Aug 29, 2019 at 06:15:00AM +, Long Li wrote: > >>>For some high performance IO devices, interrupt may come very frequently, > >>>meantime IO request completion may take a bit time. Especially on some > >>>devices(SCSI or NVMe), IO requests can be submitted concurrently from > >>>multipl

RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-28 Thread Long Li

>>>For some high performance IO devices, interrupt may come very frequently, >>>meantime IO request completion may take a bit time. Especially on some >>>devices(SCSI or NVMe), IO requests can be submitted concurrently from >>>multiple CPU cores, however IO completion is only done on one of these >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-28 Thread Thomas Gleixner

On Wed, 28 Aug 2019, Ming Lei wrote: > On Wed, Aug 28, 2019 at 01:23:06PM +0200, Thomas Gleixner wrote: > > On Wed, 28 Aug 2019, Ming Lei wrote: > > > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > > > > > > Also how is that supposed to work when sched_clock is jiffies based? >

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-28 Thread Ming Lei

On Wed, Aug 28, 2019 at 01:23:06PM +0200, Thomas Gleixner wrote: > On Wed, 28 Aug 2019, Ming Lei wrote: > > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > > > > > Also how is that supposed to work when sched_clock is jiffies based? > > > > > > > > Good catch, looks ktime_get_ns

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-28 Thread Thomas Gleixner

On Wed, 28 Aug 2019, Ming Lei wrote: > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > > > > Also how is that supposed to work when sched_clock is jiffies based? > > > > > > Good catch, looks ktime_get_ns() is needed. > > > > And what is ktime_get_ns() returning when the only a

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-28 Thread Ming Lei

On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote: > On Wed, 28 Aug 2019, Ming Lei wrote: > > On Tue, Aug 27, 2019 at 04:42:02PM +0200, Thomas Gleixner wrote: > > > On Tue, 27 Aug 2019, Ming Lei wrote: > > > > + > > > > + int cpu = raw_smp_processor_id(); > > > > + struct

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Thomas Gleixner

On Wed, 28 Aug 2019, Ming Lei wrote: > On Tue, Aug 27, 2019 at 06:19:00PM +0200, Thomas Gleixner wrote: > > > We definitely are not going to have a 64bit multiplication and division on > > > every interrupt. Asided of that this breaks 32bit builds all over the > > > place. > > > > That said, we a

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Thomas Gleixner

On Wed, 28 Aug 2019, Ming Lei wrote: > On Tue, Aug 27, 2019 at 04:42:02PM +0200, Thomas Gleixner wrote: > > On Tue, 27 Aug 2019, Ming Lei wrote: > > > + > > > + int cpu = raw_smp_processor_id(); > > > + struct irq_interval *inter = per_cpu_ptr(&avg_irq_interval, cpu); > > > + u64 delta = sched_cloc

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Ming Lei

On Tue, Aug 27, 2019 at 06:19:00PM +0200, Thomas Gleixner wrote: > On Tue, 27 Aug 2019, Thomas Gleixner wrote: > > On Tue, 27 Aug 2019, Ming Lei wrote: > > > +/* > > > + * Update average irq interval with the Exponential Weighted Moving > > > + * Average(EWMA) > > > + */ > > > +static void irq_upda

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Ming Lei

On Tue, Aug 27, 2019 at 04:42:02PM +0200, Thomas Gleixner wrote: > On Tue, 27 Aug 2019, Ming Lei wrote: > > +/* > > + * Update average irq interval with the Exponential Weighted Moving > > + * Average(EWMA) > > + */ > > +static void irq_update_interval(void) > > +{ > > +#define IRQ_INTERVAL_EWMA_WE

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Thomas Gleixner

On Tue, 27 Aug 2019, Thomas Gleixner wrote: > On Tue, 27 Aug 2019, Ming Lei wrote: > > +/* > > + * Update average irq interval with the Exponential Weighted Moving > > + * Average(EWMA) > > + */ > > +static void irq_update_interval(void) > > +{ > > +#define IRQ_INTERVAL_EWMA_WEIGHT 128 > > +#defi

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Thomas Gleixner

On Tue, 27 Aug 2019, Ming Lei wrote: > +/* > + * Update average irq interval with the Exponential Weighted Moving > + * Average(EWMA) > + */ > +static void irq_update_interval(void) > +{ > +#define IRQ_INTERVAL_EWMA_WEIGHT 128 > +#define IRQ_INTERVAL_EWMA_PREV_FACTOR127 > +#define IRQ_I

[PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-08-27 Thread Ming Lei

For some high performance IO devices, interrupt may come very frequently, meantime IO request completion may take a bit time. Especially on some devices(SCSI or NVMe), IO requests can be submitted concurrently from multiple CPU cores, however IO completion is only done on one of these submission CP

52 matches

Mail list logo