Control: severity -1 important Control: retitle -1 perl: FTBFS on buildds with steal time issues
On Sun, May 24, 2015 at 07:38:19PM +0300, Apollon Oikonomopoulos wrote: > On 16:38 Sun 24 May , Ben Hutchings wrote: > > On Sun, 2015-05-24 at 14:09 +0300, Niko Tyni wrote: > > > On Sun, May 24, 2015 at 02:55:00PM +0800, Paul Wise wrote: > > > > On Sat, 2015-05-23 at 19:10 +0200, Dominic Hargreaves wrote: > > > > > > > > > This is rather strange; any ideas from DSA? > > > > > > > > The underlying hosts do not have the same issue. > > > > > > > > All of the guests use the same virtual CPU version/flags. > > > > > > > > All of the guests use the same Linux kernel version. > > > > > > Thanks for the update. > > > > > > > I guess diving into the Linux implementation of times(2) for clues would > > > > be the next step for figuring out what the issue is here. > > > > > > I'm taking the kernel maintainers in the loop. The status here is that > > > times(2) seems to be misbehaving on some i386 and amd64 debian.org virtual > > > hosts running jessie (under ganeti/qemu, with jessie on the underlying > > > hosts too). These hosts include at least barriere and x86-grnet-01. > > > > > > The misbehaviour is that user time stays at zero all the time, as seen > > > for example with 'time yes'. This is making perl fail to build from > > > source due to test failures, and I'd expect it to affect other things too. > > > > > > Any help is appreciated. > > > > I can't reproduce this, but wonder if it's related to #784960? > > There seems to be something fundamentally broken in > barriere.debian.org's CPU time accounting, not related to times(2) per > se. Just issuing > > yes >/dev/null > > and firing up top -d1 gives the following interesting results: > > - `yes' shows up taking 100% CPU time as expected, but > - pressing `1' shows that all CPUs are idle (!) > > htop OTOH displays all CPUs as constantly 100% busy, which is > inconsistent with the system's load average (~0.8 at that point). > > Also watching the output of `cat /proc/$(pidof yes)/stat | awk '{ print > $14, $15 }'' ($14 is utime, $15 is stime per proc(5)) indeed shows 100% > system time and 0 user time. > > If you look at the `top' stats for all CPUs of barriere.debian.org, it > looks as if the only thing that's correctly being accounted for is > iowait time. Great to hear that you found the underlying cause[1] of this! I note the workaround: -cpu qemu64,-kvm_steal_time which may be applicable to the Debian hosts? Cheers, Dominic. [1] <https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg01295.html> -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org