Hi,

During our verification tests on Windows DPDK we've noticed that DPDK polling 
threads, which run in REALTIME_PRIORITY_CLASS are causing starvation to other 
threads from the OS which need to change affinity and run in lower priority.

While running an application for a while we see the OS thread waits for 2:30 
minutes and raises a bugcheck, see below example of such flow:

1) DPDK thread running on core-0 in real-time high priority(24) polling mode.
2) The thread is blocking the system function NtSetSystemInformation 
(ExpUpdateTimerConfiguration) in another thread from 
   switching to core-0 via KeSetSystemGroupAffinityThread since the calling 
thread is priority 15. 
3) NtSetSystemInformation exclusively acquired system-wide lock 
(ExpTimeRefreshLock) hence 
    it blocks other threads (e.g. calling NtQuerySystemInformation).

We've seen this behavior only while running on Windows 2019 VMs, maybe on 
native machines OS scheduling of such flow is done differently? 

Below is usage explanation from the documentation of SetPriorityClass [1]:

- REALTIME_PRIORITY_CLASS
Process that has the highest possible priority. The threads of the process 
preempt the threads of all other processes, including operating system 
processes performing important tasks. For example, a real-time process that 
executes for more than a very brief interval can cause disk caches not to flush 
or cause the mouse to be unresponsive. 

So I assume using this kind of thread for a long period as we do can cause 
unstable behavior.

How do you think we can resolve this? Are there such cases in Linux?

[1] - 
https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setpriorityclass

Thanks,

Tal.

Reply via email to