On 09/26/2016 02:10 PM, Peter Zijlstra wrote: > On Mon, Sep 26, 2016 at 02:01:43PM +0200, Christian Borntraeger wrote: >> They applied ok on next from 9/13. Things go even worse. >> With this host configuration: >> >> CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS >> 0 0 0 0 0 0:0:0:0 yes yes 0 >> 1 0 0 0 0 1:1:1:1 yes yes 1 >> 2 0 0 0 1 2:2:2:2 yes yes 2 >> 3 0 0 0 1 3:3:3:3 yes yes 3 >> 4 0 0 1 2 4:4:4:4 yes yes 4 >> 5 0 0 1 2 5:5:5:5 yes yes 5 >> 6 0 0 1 3 6:6:6:6 yes yes 6 >> 7 0 0 1 3 7:7:7:7 yes yes 7 >> 8 0 0 1 4 8:8:8:8 yes yes 8 >> 9 0 0 1 4 9:9:9:9 yes yes 9 >> 10 0 0 1 5 10:10:10:10 yes yes 10 >> 11 0 0 1 5 11:11:11:11 yes yes 11 >> 12 0 0 1 6 12:12:12:12 yes yes 12 >> 13 0 0 1 6 13:13:13:13 yes yes 13 >> 14 0 0 1 7 14:14:14:14 yes yes 14 >> 15 0 0 1 7 15:15:15:15 yes yes 15 >> >> the guest was running either on 0-3 or on 4-15, but never >> used the full system. With group scheduling disabled everything was good >> again. So looks like that this bug has also some dependency on on the >> host topology. > > OK, so CPU affinities that unevenly straddle topology boundaries like > that are hard (and is generally not recommended), but its not > immediately obvious why it would be so much worse with cgroups enabled.
Well thats what I get from LPAR... With CPUs 0-3 disabled things are better, but there is still 10% difference between group/nogroup. Will test Vincents v4 soon. In any case, would a 5 second sequence of /proc/sched_debug for the good/bad case with all 16 host CPUs (or the reduced 12 cpu set) be useful? Christian