https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029
--- Comment #87 from Don Lewis <truck...@freebsd.org> --- I set affinity back to its default value of 1 and got another clean 1700 port poudriere run. It's curious that the only issues I've had when steal_idle=0 and balance=0 happened when I set affinity=1000. This is the opposite of what I would expect. I would expect that migrations controlled by the steal_idle and balance knobs to have similar issues. In either case, the thread that is getting migrated is one that was preempted by an interrupt, and before being resumed, the scheduler noticed that the thread had exhausted its run time quantum and moved the thread to the back of the run queue for that cpu before resuming the thread that is at the front of the run queue. The only difference between steal_idle and balance is the event that actually causes the thread to migrate. When they restart, they basically just execute the kernel code to restore their state before dropping back into user mode where they were preempted from. For some reason, threads that have exhausted their time quantum seem to resume properly on the same CPU that they were previously running, but sometimes go wonky if they resume on some other CPU. The migrations controlled by the affinity knob are different. In those cases, the thread has voluntarily put itself to sleep, either because it blocked in a syscall, or perhaps trap on a page fault and then go to sleep in the kernel while the missing page is brought in. When these threads get a wakeup event, they then execute the remaining part of the syscall or the page fault handler before returning to user mode. It doesn't seem to matter what CPU these threads restart on. As a test, I set balance=1 and reduced balance_interval from its default 127 to 10 so that balance events would happen a lot more frequently to try to make up for the steal_idle being disabled. I had three port build failures. The first was a guile segfault when building finance/gnucash. The second was a unit test failure in editors/openoffice-devel. The third was build runaway in devel/doxygen. The steal_idle code in sched_ule is topology-aware, so it looks like it should be easy to hack the code to only allow migrations between SMT threads sharing the same core, or cores in the same CCX. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"