Jon?  Are yuo going to review this one?

Thanks,
Corinna


On Jan 20 00:18, Mark Geisert wrote:
> Commentary wording now refers to tasks (i.e., threads) rather than
> processes.  This makes it somewhat easier to justify adding two kinds of
> counters together.  After researching what "load average" has meant over
> time, we have what seems like a reasonable implementation, modulo
> Windows differences to Linux.  The best resource I found is:
> https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
> 
> At end of load_init(), obtain and discard the first measure of the
> counters to deal with the first call always returning error, no data.
> Follow this with a specific short delay so the next measure actually has
> data to report.
> 
> At least one older version of Windows, i.e. Win10 Pro 21H1, has a different
> name/location for the '% Processor Time' counter and is missing the
> 'Processor Queue Length' counter entirely.  Code is changed to support
> both possible locations of the former and treat the missing latter as always
> reporting 0.0.
> 
> A release note is added for 3.5.6.
> 
> Reported-by: Mark Liam Brown <brownmarkl...@gmail.com>
> Addresses: https://cygwin.com/pipermail/cygwin/2024-August/256361.html
> Signed-off-by: Mark Geisert <m...@maxrnd.com>
> Fixes: 4dc982ddf60b (Cygwin: loadavg: improve debugging of load_init)
> 
> ---
>  winsup/cygwin/loadavg.cc    | 50 ++++++++++++++++++++++++++-----------
>  winsup/cygwin/release/3.5.6 |  3 +++
>  2 files changed, 39 insertions(+), 14 deletions(-)
> 
> diff --git a/winsup/cygwin/loadavg.cc b/winsup/cygwin/loadavg.cc
> index 127591a2e..1c5e553a9 100644
> --- a/winsup/cygwin/loadavg.cc
> +++ b/winsup/cygwin/loadavg.cc
> @@ -15,17 +15,23 @@
>  
>    A global load average estimate is maintained in shared memory.  Access to 
> that
>    is guarded by a mutex.  This estimate is only updated at most every 5 
> seconds.
> +  The updates are done by any/all callers of the getloadavg() syscall.
>  
> -  We attempt to count running and runnable processes, but unlike linux we 
> don't
> -  count processes in uninterruptible sleep (blocked on I/O).
> +  We attempt to count running and runnable tasks (i.e., threads), but unlike
> +  Linux we don't count tasks in uninterruptible sleep (blocked on I/O).  
> There
> +  doesn't seem to be a kernel counter for the latter on Windows.
>  
> -  The number of running processes is estimated as (NumberOfProcessors) * (%
> -  Processor Time).  The number of runnable processes is estimated as
> -  ProcessorQueueLength.
> +  In the following text and code, "PDH" refers to Process Data Helper, a
> +  Windows component that arranges access to kernel counters.
> +
> +  The number of running tasks is estimated as
> +    (the NumberOfProcessors counter) * (the % Processor Time counter).
> +  The number of runnable tasks is taken to be the ProcessorQueueLength 
> counter.
>  
>    Note that PDH will only return data for '% Processor Time' afer the second
>    call to PdhCollectQueryData(), as it's computed over an interval, so the 
> first
> -  attempt to estimate load will fail and 0.0 will be returned.
> +  attempt to estimate load will fail and 0.0 will be returned.  (This 
> nuisance
> +  is now worked-around near the end of load_init() below.)
>  
>    We also assume that '% Processor Time' averaged over the interval since the
>    last time getloadavg() was called is a good approximation of the 
> instantaneous
> @@ -68,8 +74,19 @@ static bool load_init (void)
>      if (status != STATUS_SUCCESS)
>        {
>       debug_printf ("PdhAddEnglishCounterW(time), status %y", status);
> -     return false;
> +
> +     /* Windows 10 Pro 21H1, and maybe others, use an alternative name */
> +        status = PdhAddEnglishCounterW (query,
> +                        L"\\Processor Information(_Total)\\% Processor Time",
> +                        0, &counter1);
> +        if (status != STATUS_SUCCESS)
> +          {
> +            debug_printf ("PdhAddEnglishCounterW(alt time), status %y", 
> status);
> +            return false;
> +          }
>        }
> +
> +    /* Windows 10 Pro 21H1, and maybe others, are missing this counter */
>      status = PdhAddEnglishCounterW (query,
>                                   L"\\System\\Processor Queue Length",
>                                   0, &counter2);
> @@ -77,7 +94,7 @@ static bool load_init (void)
>      if (status != STATUS_SUCCESS)
>        {
>       debug_printf ("PdhAddEnglishCounterW(queue length), status %y", status);
> -     return false;
> +     ; /* Ignore missing counter, just use zero in later calculations */
>        }
>  
>      mutex = CreateMutex(&sec_all_nih, FALSE, "cyg.loadavg.mutex");
> @@ -87,6 +104,12 @@ static bool load_init (void)
>      }
>  
>      initialized = true;
> +
> +    /* Do the first data collection (which always fails) here, rather than in
> +       get_load(). We wait at least one tick afterward so the collection done
> +       in get_load() is guaranteed to have data to work with. */
> +    (void) PdhCollectQueryData (query); /* ignore errors */
> +    Sleep (15/*ms*/); /* wait for at least one kernel tick to have occurred 
> */
>    }
>  
>    return initialized;
> @@ -101,8 +124,8 @@ static bool get_load (double *load)
>    if (ret != ERROR_SUCCESS)
>      return false;
>  
> -  /* Estimate the number of running processes as (NumberOfProcessors) * (%
> -     Processor Time) */
> +  /* Estimate number of running tasks as
> +     (NumberOfProcessors) * (% Processor Time) */
>    PDH_FMT_COUNTERVALUE fmtvalue1;
>    ret = PdhGetFormattedCounterValue (counter1, PDH_FMT_DOUBLE, NULL, 
> &fmtvalue1);
>    if (ret != ERROR_SUCCESS)
> @@ -110,11 +133,10 @@ static bool get_load (double *load)
>  
>    double running = fmtvalue1.doubleValue * wincap.cpu_count () / 100;
>  
> -  /* Estimate the number of runnable processes using ProcessorQueueLength */
> -  PDH_FMT_COUNTERVALUE fmtvalue2;
> +  /* Estimate the number of runnable tasks as ProcessorQueueLength */
> +  PDH_FMT_COUNTERVALUE fmtvalue2 = { 0 };
>    ret = PdhGetFormattedCounterValue (counter2, PDH_FMT_LONG, NULL, 
> &fmtvalue2);
> -  if (ret != ERROR_SUCCESS)
> -    return false;
> +  /* Ignore any error accessing this counter, just treat as if zero was read 
> */
>  
>    LONG rql = fmtvalue2.longValue;
>  
> diff --git a/winsup/cygwin/release/3.5.6 b/winsup/cygwin/release/3.5.6
> index 4ccf94e38..5322be6d7 100644
> --- a/winsup/cygwin/release/3.5.6
> +++ b/winsup/cygwin/release/3.5.6
> @@ -13,3 +13,6 @@ Fixes:
>  
>  - Add fd validation where needed in mq_* functions.
>    Addresses: https://cygwin.com/pipermail/cygwin/2025-January/257090.html
> +
> +- Minor updates to load average calculations.
> +  Addresses: https://cygwin.com/pipermail/cygwin/2024-August/256361.html
> -- 
> 2.45.1

Reply via email to