update the document for empty poll API. Signed-off-by: Liang Ma <liang.j...@intel.com> --- doc/guides/prog_guide/power_man.rst | 87 +++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+)
diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index eba1cc6..d8a4ef7 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -106,6 +106,93 @@ User Cases The power management mechanism is used to save power when performing L3 forwarding. + +Empty Poll API +-------------- + +Abstract +~~~~~~~~ + +For packet processing workloads such as DPDK polling is continuous. +This means CPU cores always show 100% busy independent of how much work +those cores are doing. It is critical to accurately determine how busy +a core is hugely important for the following reasons: + + * No indication of overload conditions + * User do not know how much real load is on a system meaning + resulted in wasted energy as no power management is utilized + +Compared to the original l3fwd-power design, instead of going to sleep +after detecting an empty poll, the new mechanism just lowers the core frequency. +As a result, the application does not stop polling the device, which leads +to improved handling of bursts of traffic. + +When the system become busy, the empty poll mechanism can also increase the core +frequency (including turbo) to do best effort for intensive traffic. This gives +us more flexible and balanced traffic awareness over the standard l3fwd-power +application. + + +Proposed Solution +~~~~~~~~~~~~~~~~~ +The proposed solution focuses on how many times empty polls are executed. +The less the number of empty polls, means current core is busy with processing +workload, therefore, the higher frequency is needed. The high empty poll number +indicates the current core not doing any real work therefore, we can lower the +frequency to safe power. + +In the current implementation, each core has 1 empty-poll counter which assume +1 core is dedicated to 1 queue. This will need to be expanded in the future to +support multiple queues per core. + +Power state definition: +^^^^^^^^^^^^^^^^^^^^^^^ + +* LOW: Not currently used, reserved for future use. + +* MED: the frequency is used to process modest traffic workload. + +* HIGH: the frequency is used to process busy traffic workload. + +There are two phases to establish the power management system: +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +* Initialization/Training phase. The training phase is necessary + in order to figure out the system polling baseline numbers from + idle to busy. The highest poll count will be during idle, where all + polls are empty. These poll counts will be different between + systems due to the many possible processor micro-arch, cache + and device configurations, hence the training phase. + In the training phase, traffic is blocked so the training algorithm + can average the empty-poll numbers for the LOW, MED and + HIGH power states in order to create a baseline. + The core's counter are collected every 10ms, and the Training + phase will take 2 seconds. + +* Normal phase. When the training phase is complete, traffic is + started. The run-time poll counts are compared with the + baseline and the decision will be taken to move to MED power + state or HIGH power state. The counters are calculated every + 10ms. + + +API Overview for Empty Poll Power Management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* **State Init**: initialize the power management system. + +* **State Free**: free the resource hold by power management system. + +* **Update Empty Poll Counter**: update the empty poll counter. + +* **Update Valid Poll Counter**: update the valid poll counter. + +* **Set the Fequence Index**: update the power state/frequency mapping. + +* **Detect empty poll state change**: empty poll state change detection algorithm. + +User Cases +---------- +The mechanism can applied to any device which is based on polling. e.g. NIC, FPGA. + References ---------- -- 2.7.5