Hi Morten > -----Original Message----- > From: Morten Brørup <m...@smartsharesystems.com> > > > From: David Coyle [mailto:david.co...@intel.com] > > Sent: Wednesday, 3 May 2023 13.39 > > > > This is NOT for upstreaming. This is being submitted to allow early > > comparison testing with the preferred solution, which will add TAPUSE > > power management support to the ring library through the addition of > > callbacks. Initial stages of the preferred solution are available at > > http://dpdk.org/patch/125454. > > > > This patch adds functionality directly to rte_ring_dequeue functions > > to monitor the empty reads of the ring. When a configurable number of > > empty reads is reached, a TPAUSE instruction is triggered by using > > rte_power_pause() on supported architectures. rte_pause() is used on > > other architectures. The functionality can be included or excluded at > > compilation time using the RTE_RING_PMGMT flag. If included, the new > > API can be used to enable/disable the feature on a per-ring basis. > > Other related settings can also be configured using the API. > > I don't understand why DPDK developers keep spending time on trying to > invent methods to determine application busyness based on entry/exit > points in a variety of libraries, when the application is in a much better > position to determine busyness. All of these "busyness measuring" library > extensions have their own specific assumptions and weird limitations. > > I do understand that the goal is power saving, which certainly is relevant! I > only criticize the measuring methods. > > For reference, we implemented something very simple in our application > framework: > 1. When each pipeline stage has completed a burst, it reports if it was busy > or > not. > 2. If the pipeline busyness is low, we take a nap to save some power. > > And here is the magic twist to this simple algorithm: > 3. A pipeline stage is not considered busy unless it processed a full burst, > and > is ready to process more packets immediately. This interpretation of > busyness has a significant impact on the percentage of time spent napping > during the low-traffic hours. > > This algorithm was very quickly implemented. It might not be perfect, and we > do intend to improve it (also to determine CPU Utilization on a scale that the > end user can translate to a linear interpretation of how busy the system is). > But I seriously doubt that any of the proposed "busyness measuring" library > extensions are any better. > > So: The application knows better, please spend your precious time on > something useful instead. > > @David, my outburst is not directed at you specifically. Generally, I do > appreciate experimenting as a good way of obtaining knowledge. So thank > you for sharing your experiments with this audience! > > PS: If cruft can be disabled at build time, I generally don't oppose to it.
[DC] Appreciate that feedback, and it is certainly another way of looking at and tackling the problem that we are ultimately trying to solve (i.e power saving) The problem however is that we work with a large number of ISVs and operators, each with their own workload architecture and implementation. That means we would have to work individually with each of these to integrate this type of pipeline-stage-busyness algorithm into their applications. And as these applications are usually commercial, non-open-source applications, that could prove to be very difficult. Also most ISVs and operators don't want to have to worry about changing their application, especially their fast-path dataplane, in order to get power savings. They prefer for it to just happen without them caring about the finer details. For these reasons, consolidating the busyness algorithms down into the DPDK libraries and PMDs is currently the preferred solution. As you say though, the libraries and PMDs may not be in the best position to determine the busyness of the pipeline, but it provides a good balance between achieving power savings and ease of adoption. It's also worth calling out again that this patch is only to allow early testing by some customers of the benefit of adding TPAUSE support to the ring library. We don't intend on this patch being upstreamed. The preferred longer term solution is to use callbacks from the ring library to initiate the pause (either via the DPDK power management API or through functions that an ISV may write themselves). This is mentioned in the commit message. Also, the pipeline stage busyness algorithm that you have added to your pipeline - have you ever considered implementing this into DPDK as a generic type library. This could certainly be of benefit to other DPDK application developers, and having this mechanism in DPDK could again ease the adoption and realisation of power savings for others. I understand though if this is your own secret sauce and you want to keep it like that :) David