At $WORK we use a modification of DEVICE_POLLING instead of running our NICs in interrupt mode. With the ULE scheduler we are seeing that CPU utilization (e.g. in top -SH) is completely wrong: the polling threads always end up being reported at a utilization of 0%.
I see problems both with the CPU utilization algorithm introduced in r232917 as well as the original one. The problem with the original algorithm is pretty easy to explain: ULE was sampling for CPU usage in hardclock(), which also kicks off the polling threads, so samples are never taken when the polling thread was running. It appears that r232917 attempts to do time-based CPU accounting instead of sampling based. sched_pctcpu_update() is called at various places to update the CPU usage of each thread: static void sched_pctcpu_update(struct td_sched *ts, int run) { int t = ticks; if (t - ts->ts_ltick >= SCHED_TICK_TARG) { ts->ts_ticks = 0; ts->ts_ftick = t - SCHED_TICK_TARG; } else if (t - ts->ts_ftick >= SCHED_TICK_MAX) { ts->ts_ticks = (ts->ts_ticks / (ts->ts_ltick - ts->ts_ftick)) * (ts->ts_ltick - (t - SCHED_TICK_TARG)); ts->ts_ftick = t - SCHED_TICK_TARG; } if (run) ts->ts_ticks += (t - ts->ts_ltick) << SCHED_TICK_SHIFT; ts->ts_ltick = t; } The problem with it is that it only seems to work at the granularity of 1 tick. My polling threads get woken up at each hardclock() invocation and stop running before the next hardclock() invocation, so ticks is (almost) never incremented while the polling thread is running. This means that when sched_pctcpu_update is called when the polling thread is going to sleep, run=1 but ts->ts_ltick == ticks, so ts_ticks is incremented by 0. When the polling thread is woken up again, ticks has been incremented in the meantime and sched_pctcpu_update is called with run=0, so ts_ticks is not incremented but ltick is set to ticks. The effect is that ts_ticks is never incremented so CPU usage is always reported as 0. I think that you'll see the same effect with the softclock threads, too. I've experimented with reverting r232917 and instead moving the sampling code from sched_tick() to sched_clock(), and that seems to give me reasonably accurate results (for my workload, anyway). The other option would be to use a timer with a higher granularity than ticks in sched_pctcpu_update(). _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"