Hi, I’ve applied your patch on kernel 4.17.0 and dropped packets and rx_missed_errors are still present, through they are increasing at a lower rate.
root@shaper:~# ./test rx_missed_errors: 2135 RX errors 0 dropped 2155 overruns 0 frame 0 sleeping 60 seconds rx_missed_errors: 2433 RX errors 0 dropped 2459 overruns 0 frame 0 sleeping 60 seconds rx_missed_errors: 2433 RX errors 0 dropped 2465 overruns 0 frame 0 sleeping 60 seconds rx_missed_errors: 2526 RX errors 0 dropped 2564 overruns 0 frame 0 sleeping 60 seconds > On 3 Dec 2020, at 21:43, Andrei Popa <andreipo...@gmail.com> wrote: > > Hi, > > On what kernel version should I try the patch ? I tried on 5.9 and it doesn't > build. > >> On 18 Nov 2020, at 20:47, Rafael J. Wysocki <r...@rjwysocki.net> wrote: >> >> On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote: >>> On 11/16/2020 8:11 AM, Andrei Popa wrote: >>>> Hello, >>>> >>>> After an update from vmlinuz-4.15.0-106-generic to >>>> vmlinuz-5.4.0-37-generic we experience, on a number of servers, a very >>>> high number of rx_missed_errors and dropped packets only on the uplink 10G >>>> interface. We have another 10G downlink interface with no problems. >>>> >>>> The affected servers have the following mainboards: >>>> S5520HC ver E26045-455 >>>> S5520UR ver E22554-751 >>>> S5520UR ver E22554-753 >>>> S5000VSA >>>> >>>> On other 30 servers with similar mainboards and/or configs there are no >>>> dropped packets with vmlinuz-5.4.0-37-generic. >>>> >>>> We’ve installed vanilla 4.16 and there were no dropped packets. >>>> Vanilla 4.17 had a very high number of dropped packets like the following: >>>> >>>> root@shaper:~# cat test >>>> #!/bin/bash >>>> while true >>>> do >>>> ethtool -S ens6f1|grep "missed_errors" >>>> ifconfig ens6f1|grep RX|grep dropped >>>> sleep 1 >>>> done >>>> >>>> root@shaper:~# ./test >>>> rx_missed_errors: 2418845 >>>> RX errors 0 dropped 2418888 overruns 0 frame 0 >>>> rx_missed_errors: 2426175 >>>> RX errors 0 dropped 2426218 overruns 0 frame 0 >>>> rx_missed_errors: 2431910 >>>> RX errors 0 dropped 2431953 overruns 0 frame 0 >>>> rx_missed_errors: 2437266 >>>> RX errors 0 dropped 2437309 overruns 0 frame 0 >>>> rx_missed_errors: 2443305 >>>> RX errors 0 dropped 2443348 overruns 0 frame 0 >>>> rx_missed_errors: 2448357 >>>> RX errors 0 dropped 2448400 overruns 0 frame 0 >>>> rx_missed_errors: 2452539 >>>> RX errors 0 dropped 2452582 overruns 0 frame 0 >>>> >>>> We did a git bisect and we’ve found that the following commit generates >>>> the high number of dropped packets: >>>> >>>> Author: Rafael J. Wysocki <rafael.j.wyso...@intel.com >>>> <mailto:rafael.j.wyso...@intel.com>> >>>> Date: Thu Apr 5 19:12:43 2018 +0200 >>>> cpuidle: menu: Avoid selecting shallow states with stopped tick >>>> If the scheduler tick has been stopped already and the governor >>>> selects a shallow idle state, the CPU can spend a long time in that >>>> state if the selection is based on an inaccurate prediction of idle >>>> time. That effect turns out to be relevant, so it needs to be >>>> mitigated. >>>> To that end, modify the menu governor to discard the result of the >>>> idle time prediction if the tick is stopped and the predicted idle >>>> time is less than the tick period length, unless the tick timer is >>>> going to expire soon. >>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com >>>> <mailto:rafael.j.wyso...@intel.com>> >>>> Acked-by: Peter Zijlstra (Intel) <pet...@infradead.org >>>> <mailto:pet...@infradead.org>> >>>> diff --git a/drivers/cpuidle/governors/menu.c >>>> b/drivers/cpuidle/governors/menu.c >>>> index 267982e471e0..1bfe03ceb236 100644 >>>> --- a/drivers/cpuidle/governors/menu.c >>>> +++ b/drivers/cpuidle/governors/menu.c >>>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, >>>> struct cpuidle_device *dev, >>>> */ >>>> data->predicted_us = min(data->predicted_us, expected_interval); >>>> - /* >>>> - * Use the performance multiplier and the user-configurable >>>> - * latency_req to determine the maximum exit latency. >>>> - */ >>>> - interactivity_req = data->predicted_us / >>>> performance_multiplier(nr_iowaiters, cpu_load); >>>> - if (latency_req > interactivity_req) >>>> - latency_req = interactivity_req; >>> >>> The tick_nohz_tick_stopped() check may be done after the above and it >>> may be reworked a bit. >>> >>> I'll send a test patch to you shortly. >> >> The patch is appended, but please note that it has been rebased by hand and >> not tested. >> >> Please let me know if it makes any difference. >> >> And in the future please avoid pasting the entire kernel config to your >> reports, that's problematic. >> >> --- >> drivers/cpuidle/governors/menu.c | 23 ++++++++++++----------- >> 1 file changed, 12 insertions(+), 11 deletions(-) >> >> Index: linux-pm/drivers/cpuidle/governors/menu.c >> =================================================================== >> --- linux-pm.orig/drivers/cpuidle/governors/menu.c >> +++ linux-pm/drivers/cpuidle/governors/menu.c >> @@ -308,18 +308,18 @@ static int menu_select(struct cpuidle_dr >> get_typical_interval(data, predicted_us)) * >> NSEC_PER_USEC; >> >> - if (tick_nohz_tick_stopped()) { >> - /* >> - * If the tick is already stopped, the cost of possible short >> - * idle duration misprediction is much higher, because the CPU >> - * may be stuck in a shallow idle state for a long time as a >> - * result of it. In that case say we might mispredict and use >> - * the known time till the closest timer event for the idle >> - * state selection. >> - */ >> - if (data->predicted_us < TICK_USEC) >> - data->predicted_us = min_t(unsigned int, TICK_USEC, >> - ktime_to_us(delta_next)); >> + /* >> + * If the tick is already stopped, the cost of possible short idle >> + * duration misprediction is much higher, because the CPU may be stuck >> + * in a shallow idle state for a long time as a result of it. In that >> + * case, say we might mispredict and use the known time till the closest >> + * timer event for the idle state selection, unless that event is going >> + * to occur within the tick time frame (in which case the CPU will be >> + * woken up from whatever idle state it gets into soon enough anyway). >> + */ >> + if (tick_nohz_tick_stopped() && data->predicted_us < TICK_USEC && >> + delta_next >= TICK_NSEC) { >> + data->predicted_us = ktime_to_us(delta_next); >> } else { >> /* >> * Use the performance multiplier and the user-configurable >