On Tue, Jan 12, 2021 at 6:37 PM Anatoly Burakov <anatoly.bura...@intel.com> wrote: > > This patchset proposes a simple API for Ethernet drivers to cause the > CPU to enter a power-optimized state while waiting for packets to > arrive. There are multiple proposed mechanisms to achieve said power > savings: simple frequency scaling, idle loop, and monitoring the Rx > queue for incoming packages. The latter is achieved through cooperation > with the NIC driver that will allow us to know address of wake up event, > and wait for writes on that address. > > On IA, this is achieved through using UMONITOR/UMWAIT instructions. They > are used in their raw opcode form because there is no widespread > compiler support for them yet. Still, the API is made generic enough to > hopefully support other architectures, if they happen to implement > similar instructions. > > To achieve power savings, there is a very simple mechanism used: we're > counting empty polls, and if a certain threshold is reached, we employ > one of the suggested power management schemes automatically, from within > a Rx callback inside the PMD. Once there's traffic again, the empty poll > counter is reset. > > This patchset also introduces a few changes into existing power > management-related intrinsics, namely to provide a native way of waking > up a sleeping core without application being responsible for it, as well > as general robustness improvements. There's quite a bit of locking going > on, but these locks are per-thread and very little (if any) contention > is expected, so the performance impact shouldn't be that bad (and in any > case the locking happens when we're about to sleep anyway). > > Why are we putting it into ethdev as opposed to leaving this up to the > application? Our customers specifically requested a way to do it with > minimal changes to the application code. The current approach allows to > just flip a switch and automatically have power savings. > > Things of note: > > - Only 1:1 core to queue mapping is supported, meaning that each lcore > must at most handle RX on a single queue
If we want to save power, it is likely we would poll more rxqs on a thread. > - Support 3 type policies. Monitor/Pause/Frequency Scaling > - Power management is enabled per-queue > - The API doesn't extend to other device types > > v16: > - Implemented Konstantin's suggestions and comments > - Added return values to the API - This revision breaks SPDK build (reported by UNH): http://mails.dpdk.org/archives/test-report/2021-January/174069.html - Build is broken for ARM and PPC at patch: 86491d5bd4 - (HEAD) eal: add monitor wakeup function (25 minutes ago) <Anatoly Burakov> Only pasting the ARM failure: ninja: Entering directory `/home/dmarchan/builds/build-arm64-host-clang' [1/297] Compiling C object 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o'. FAILED: lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o aarch64-linux-gnu-gcc -Ilib/76b5a35@@rte_eal@sta -Ilib -I../../dpdk/lib -I. -I../../dpdk/ -Iconfig -I../../dpdk/config -Ilib/librte_eal/include -I../../dpdk/lib/librte_eal/include -Ilib/librte_eal/linux/include -I../../dpdk/lib/librte_eal/linux/include -Ilib/librte_eal/arm/include -I../../dpdk/lib/librte_eal/arm/include -Ilib/librte_eal/common -I../../dpdk/lib/librte_eal/common -Ilib/librte_eal -I../../dpdk/lib/librte_eal -Ilib/librte_kvargs -I../../dpdk/lib/librte_kvargs -Ilib/librte_telemetry/../librte_metrics -I../../dpdk/lib/librte_telemetry/../librte_metrics -Ilib/librte_telemetry -I../../dpdk/lib/librte_telemetry -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-packed-not-aligned -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=armv8-a+crc -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -Wno-format-truncation '-DABI_VERSION="21.1"' -DRTE_LIBEAL_USE_GETENTROPY -MD -MQ 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o' -MF 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o.d' -o 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o' -c ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c:35:1: error: conflicting types for ‘rte_power_monitor_wakeup’ rte_power_monitor_wakeup(const unsigned int lcore_id) ^~~~~~~~~~~~~~~~~~~~~~~~ In file included from ../../dpdk/lib/librte_eal/arm/include/rte_power_intrinsics.h:14, from ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c:5: ../../dpdk/lib/librte_eal/include/generic/rte_power_intrinsics.h:79:5: note: previous declaration of ‘rte_power_monitor_wakeup’ was here int rte_power_monitor_wakeup(const unsigned int lcore_id); ^~~~~~~~~~~~~~~~~~~~~~~~ ninja: build stopped: subcommand failed. - The ABI check is still not happy as I reported earlier. Reproduced on v16 (GHA had a hiccup on this revision, but previous ones had the failure too): 1 Changed variable: [C] 'rte_eth_dev rte_eth_devices[]' was changed at rte_ethdev_core.h:196:1: type of variable changed: array element type 'struct rte_eth_dev' changed: type size hasn't changed 1 data member change: type of 'const eth_dev_ops* rte_eth_dev::dev_ops' changed: in pointed to type 'const eth_dev_ops': in unqualified underlying type 'struct eth_dev_ops' at rte_ethdev_driver.h:789:1: type size changed from 6208 to 6272 (in bits) 1 data member insertion: 'eth_get_monitor_addr_t eth_dev_ops::get_monitor_addr', at offset 6208 (in bits) at rte_ethdev_driver.h:940:1 no data member changes (94 filtered); type size hasn't changed Error: ABI issue reported for 'abidiff --suppr /home/dmarchan/dpdk/devtools/../devtools/libabigail.abignore --no-added-syms --headers-dir1 /home/dmarchan/abi/v20.11/build-gcc-static/usr/local/include --headers-dir2 /home/dmarchan/builds/build-gcc-static/install/usr/local/include /home/dmarchan/abi/v20.11/build-gcc-static/dump/librte_ethdev.dump /home/dmarchan/builds/build-gcc-static/install/dump/librte_ethdev.dump' ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue). One solution is to add an exception on the eth_dev_ops structure. --- a/devtools/libabigail.abignore +++ b/devtools/libabigail.abignore @@ -7,3 +7,7 @@ symbol_version = INTERNAL [suppress_variable] symbol_version = INTERNAL + +; Explicit ignore for driver-only ABI +[suppress_type] + name = eth_dev_ops -- David marchand