> From: Robin Jarry [mailto:rja...@redhat.com] > Sent: Wednesday, 23 November 2022 11.19 > To: dev@dpdk.org > Cc: Bruce Richardson; Jerin Jacob; Kevin Laatz; Konstantin Ananyev; > Mattias Rönnblom; Morten Brørup; Robin Jarry > Subject: [RFC PATCH 2/4] eal: allow applications to report their cpu > utilization > > Allow applications to register a callback that will be invoked in > rte_lcore_dump() and when requesting lcore info in the telemetry API. > > The callback is expected to return a number between 0 and 100 > representing the percentage of busy cycles spent over a fixed period of > time. The period of time is configured when registering the callback. > > Cc: Bruce Richardson <bruce.richard...@intel.com> > Cc: Jerin Jacob <jer...@marvell.com> > Cc: Kevin Laatz <kevin.la...@intel.com> > Cc: Konstantin Ananyev <konstantin.v.anan...@yandex.ru> > Cc: Mattias Rönnblom <hof...@lysator.liu.se> > Cc: Morten Brørup <m...@smartsharesystems.com> > Signed-off-by: Robin Jarry <rja...@redhat.com> > ---
This patch simply provides a function for the application to register a constant X and a callback, which returns Y. X happens to be a duration in seconds. Y can be a number between 0 and 100, and happens to be the lcore business (to be calculated by the application). So I agree that it contains no controversial calculations. :-) However, if the lcore business is supposed to be used for power management or similar, it must have much higher resolution than one second. Also, CPU Usage is often reported in multiple time intervals, e.g. /proc/loadavg provides 1, 5 and 10 minute load averages. Perhaps a deeper issue is that the output could also be considered statistics, which is handled differently in different applications. E.g. the statistics module in the SmartShare StraightShaper application includes histories in multiple time resolutions, e.g. 5 minutes in 1-second intervals, up to 1 year in 1 day intervals. On the other hand, if the application must expose 1/5/10 minute statistics, it could register a callback with a 1 minute interval, and aggregate the numbers it its own statistics module. Here's completely different angle, considering how statistics is often collected and processed by SNMP based tools: This patch is based on a "gauge" (i.e. the busyness percentage) and an "interval" (i.e. the duration the gauge covers). I have to sample this gauge exactly every interval to collect data for a busyness chart. If the application's reporting interval is 1 second, I must sample the gauge every second, or statistical information will be lost. Instead, I would prefer the callback to return two counters: units_passed (e.g. number of cycles since application start) and units_busy (e.g. number of busy cycles since application start). I can sample these at any interval, and calculate the busyness of that interval as the difference: (units_busy - units_busy_before) / (units_passed - units_passed_before). If needed, I can also sample them at multiple intervals, e.g. every 1, 5 and 10 minutes, and expose in the "loadavg". I can also sample them every millisecond if I need to react quickly to a sudden increase/drop in busyness.