On Thu, Dec 17, 2020 at 3:06 PM Anatoly Burakov <anatoly.bura...@intel.com> wrote: > > This patchset proposes a simple API for Ethernet drivers to cause the > CPU to enter a power-optimized state while waiting for packets to > arrive. This is achieved through cooperation with the NIC driver that > will allow us to know address of wake up event, and wait for writes on > it. > > On IA, this is achieved through using UMONITOR/UMWAIT instructions. They > are used in their raw opcode form because there is no widespread > compiler support for them yet. Still, the API is made generic enough to > hopefully support other architectures, if they happen to implement > similar instructions. > > To achieve power savings, there is a very simple mechanism used: we're > counting empty polls, and if a certain threshold is reached, we get the > address of next RX ring descriptor from the NIC driver, arm the > monitoring hardware, and enter a power-optimized state. We will then > wake up when either a timeout happens, or a write happens (or generally > whenever CPU feels like waking up - this is platform-specific), and > proceed as normal. The empty poll counter is reset whenever we actually > get packets, so we only go to sleep when we know nothing is going on. > The mechanism is generic which can be used for any write back > descriptor. > > This patchset also introduces a few changes into existing power > management-related intrinsics, namely to provide a native way of waking > up a sleeping core without application being responsible for it, as well > as general robustness improvements. There's quite a bit of locking going > on, but these locks are per-thread and very little (if any) contention > is expected, so the performance impact shouldn't be that bad (and in any > case the locking happens when we're about to sleep anyway, not on a > hotpath). > > Why are we putting it into ethdev as opposed to leaving this up to the > application? Our customers specifically requested a way to do it wit > minimal changes to the application code. The current approach allows to > just flip a switch and automatically have power savings. > > - Only 1:1 core to queue mapping is supported, meaning that each lcore > must at most handle RX on a single queue > - Support 3 type policies. Monitor/Pause/Frequency Scaling > - Power management is enabled per-queue > - The API doesn't extend to other device types
Fyi, ovsrobot Travis being KO, you probably missed that GHA CI caught this: https://github.com/ovsrobot/dpdk/runs/1571056574?check_suite_focus=true#step:13:16082 We will have to put an exception on driver only ABI. -- David Marchand