On 2020-03-06 07:05, Venky Venkatesh wrote: > Hi Mattias, > Have a question on this fix. I understand you wanting a certain number > of events before making a decision to migrate (in the above fix). > However, suppose there are fewer events over a few flows (even if not > many) and yet your core is heavily loaded -- indicating may be they > are one or more very CPU intensive flows. Often in DPI situations > depending on the complexity of the policy you can get delayed longer. > It might still be worthwhile to migrate if the other cores are really > lightly loaded. I think that case will be missed out in this approach. >
The reason for requiring at least 128 events recorded was primarily since it simplified the code a little, in combination with the fact that most packet processing happens at much higher speeds than 128 events/1 ms. If you have a lower rate, the only thing that's going to happen is that the migration will happen a little later, when you've hit 128. If you run at 2,6 GHz with high utilization, and fail to process more than 128 events in 1 ms, then the average cost of processing an event is in the range of 20.000 clock cycles, which seemed a lot to me, maybe even for DPI? > Fundamentally, the number of packets being a proxy-metric for the load > of that flow on the cpu is simplistic at times. Very CPU intensive > medium/lower bandwidth flows can be picked up in this heuristic. If > there is a way that at the time of DSW init we can have a way of > tuning it depending on the application scenario it might be more flexible. > Yes, indeed. If you have large asymmetries in the average per-event processing latency for different flow types, this can become a problem, with what's effectively a very heavy flow, "masking" as a small one, and causing trouble when it's being migrated. To individually keep track of which flow generates what load is just too expensive, if DSW should remain a good choice for high packet-rate applications. Just the rdtsc required to figure how much a single event costed to process that likely would be required will substantially increase the scheduler overhead. Add to this the need to load, process and store this information. The parameter would need to control an approach taken, not just tweaking a single algorithm, I think. Maybe if you'd reorganized the code a little, you could do this in a clean way. First though you would like to know it's a real problem, in a real application - not just a theory. An alternative approach that might be possible is to break the processing of these very heavy flows into more processing stages. Instead of, say, one 20k cycles stage, you have four stages of roughly 1/4 the size. That would also significant improve single-flow performance, if that's relevant. > Thanks > -Venky > > > On Thu, Mar 5, 2020 at 2:47 AM Mattias Rönnblom > <mattias.ronnb...@ericsson.com <mailto:mattias.ronnb...@ericsson.com>> > wrote: > > Avoid reusing recorded events when performing a migration, since this > may make the migration selection logic pick an already-moved flow. > > Fixes: f6257b22e767 ("event/dsw: add load balancing") > Cc: sta...@dpdk.org <mailto:sta...@dpdk.org> > > Reported-by: Venky Venkatesh <vvenkat...@paloaltonetworks.com > <mailto:vvenkat...@paloaltonetworks.com>> > Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com > <mailto:mattias.ronnb...@ericsson.com>> > --- > drivers/event/dsw/dsw_event.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/event/dsw/dsw_event.c > b/drivers/event/dsw/dsw_event.c > index d68b71b98..296adea18 100644 > --- a/drivers/event/dsw/dsw_event.c > +++ b/drivers/event/dsw/dsw_event.c > @@ -646,6 +646,9 @@ dsw_port_consider_migration(struct dsw_evdev *dsw, > if (dsw->num_ports == 1) > return; > > + if (seen_events_len < DSW_MAX_EVENTS_RECORDED) > + return; > + > DSW_LOG_DP_PORT(DEBUG, source_port->id, "Considering > migration.\n"); > > /* Randomize interval to avoid having all threads considering > -- > 2.17.1 >