Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
[The historical notes are removed and replaced by partial trace information from example hang-ups, not that I've figured out what contributes yet.] I ran into the following while trying to get evidence about the hang-up for an amd64->armv7 cross-build of multimedia/gstreamer1-qt@qt5 . The following from trying to get evidence for the hang-up via a manual run of "make multimedia/gstreamer1-qt FLAVOR=qt5” in a poudriere bulk -i’s interactive mode for the context that has the hang-up in normal poudriere-devel runs. From top after the hang-up (to identify some context): 14528 root 2 520 100M24M0 kqread 11 0:00 0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/. 14527 root 2 52088M13M0 select 22 0:00 0.00% /usr/local/bin/qemu-arm-static ninja -j1 -v all from ps -auxd as well (to identify more context): root 101140.0 0.0 10328 1756 1 I+J 13:47 0:00.01 | `-- make FLAVOR=qt5 root 145260.0 0.0 10204 1792 1 I+J 13:50 0:00.00 | `-- /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELE root 145270.0 0.0 90304 13084 1 I+J 13:50 0:00.09 | `-- /usr/local/bin/qemu-arm-static ninja -j1 -v all root 145280.0 0.0 102876 25060 1 IJ 13:50 0:00.12 | `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g I had made a qemu-user-static that enabled do_strace when it is used to run cmake or ninja. The only do_strace lines from qemu-arm-static running cmake or ninja mentioning process 14528 are included in the sequence: (Before the below was a long list of "14527 fstatat” lines. I’ll note that "'Unknown syscall 545” is from ppoll use.) 82400 sigprocmask(1,-1610620016,-191968524,-186261416,0,24) = 0 82400 sigaction(2,-1610620040,-191968596,-186261584,210460,0) = 0 82400 sigaction(15,-1610620040,-191968572,-186261584,210460,0) = 0 82400 sigaction(1,-1610620040,-191968548,-186261584,210460,0) = 0 82400 gettimeofday(-1610619984,0,4,-186261584,-1610619440,-1610619528) = 0 82400 gettimeofday(-1610619984,0,4,359949,1545969996,0) = 0 82400 gettimeofday(-1610620120,0,4,2,-184666112,-1610619520) = 0 82400 fstatat(-100,"elements/gstqtvideosink/CMakeFiles", 0x9fffe200, 0) = 0 82400 fstatat(-100,"elements/gstqtvideosink/gstqt5videosink_autogen", 0x9fffe200, 0) = 0 82400 pipe2(-1610620176,0,-1610620108,0,-1610620120,167084) = 0 82400 fcntl(5,1,-1610620108,-185863932,-192200556,-1610620228) = 0 82400 fcntl(5,2,1,-185863932,-192200556,-1610620228) = 0 82400 vfork(0,66450,-186876196,-1610620184,-1610620240,0) = 82401 82400 close(6) = 0 = 0 82400 Unknown syscall 545 82401 setpgid(0,0,-186876196,-1610620184,-1610620240,0) = 0 82401 sigprocmask(3,-191586912,0,-1610620184,-1610620240,0) = 0 82401 close(5) = 0 82401 open("/dev/null",0,0) = 5 82401 dup2(5,0,0,-1610620184,-1610620240,0) = 0 82401 close(5) = 0 82401 fcntl(0,2,0,-1610620184,-1610620240,0) = 0 82401 dup2(6,1,0,-1610620184,-1610620240,0) = 1 82401 fcntl(1,2,0,-1610620184,-1610620240,0) = 0 82401 dup2(6,2,0,-1610620184,-1610620240,0)82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0 The vfork then close(6) sequence for 82400 vs. the later use of 6 in dup2 in 82401 may be rather odd. But it looks like qemu-*-static uses do_freebsd_fork to implement do_freebsd_vfork, despite reporting vfork before calling do_freebsd_vfork. (Does the close(6) appear to indicate a race for native operation of ninja for the period when the address space is shared?) Ninja has Subprocess::Start code that has: #ifdef POSIX_SPAWN_USEVFORK flags |= POSIX_SPAWN_USEVFORK; #endif if (posix_spawnattr_setflags(&attr, flags) != 0) Fatal("posix_spawnattr_setflags: %s", strerror(errno)); const char* spawned_args[] = { "/bin/sh", "-c", command.c_str(), NULL }; if (posix_spawn(&pid_, "/bin/sh", &action, &attr, const_cast(spawned_args), environ) != 0) Fatal("posix_spawn: %s", strerror(errno)); that is in use here. I think that this explains the vfork use. It turns out that putting the hung-up build in the background and then killing 82401 with the likes of kill -6 leads to more output that had apparently been buffered. It shows the use of the (amd64 native) /bin/sh that in turn leads to /usr/local/bin/cmake via qemu-arm-static. /bin/sh, being native, gets no do_strace output from qemu-arm-static. 82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0 82400 read(5,0x9fffd368,4096) = 58 82400 Unknown syscall 545 82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0 82400 read(5,0x9fffd368,4096) = 0 82400 close(5) = 0 82400 wait4(82401,-1610620004,0,0,-191968640,0) = 82401 82400 mmap(0,86016,3,201330690,-1,-1610620169) = 0xf4777000 82400 gettimeofday(-1610620224,0,4,-161061994
FreeBSD ports you maintain which are out of date
Dear port maintainer, The portscout new distfile checker has detected that one or more of your ports appears to be out of date. Please take the opportunity to check each of the ports listed below, and if possible and appropriate, submit/commit an update. If any ports have already been updated, you can safely ignore the entry. You will not be e-mailed again for any of the port/version combinations below. Full details can be found at the following URL: http://portscout.freebsd.org/po...@freebsd.org.html Port| Current version | New version +-+ x11-toolkits/gtkdatabox | 0.9.3.0 | 0.9.3.1 +-+ If any of the above results are invalid, please check the following page for details on how to improve portscout's detection and selection of distfiles on a per-port basis: http://portscout.freebsd.org/info/portscout-portconfig.txt Thanks. ___ freebsd-ports@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
On 2018-Dec-28, at 05:13, Michal Meloun wrote: > Mark, > this is known problem with qemu-user-static. > Emulation of every single interruptible syscall is broken by design (it > have signal related races). Theses races cannot be solved without major > rewrite of syscall emulation code. > Unfortunately, nobody actively works on this, I think. > Thanks for the note setting some expectations. On the evidence that I have I expect that more is going on than that: A) The hang-up always happens and always in the same place. So it would appear that no race is involved. B) (A) is true even for varying the number of builders in parallel (so other builds also happening) and the number of jobs allowed per builder. It also fails for only one builder allowed only one process. (I get traces from that last kind of context.) C) The problem started on the package-building servers for armv7 and armv6 without qemu-user-static having an update (FreeBSD and cmake had updates, for example). D) The problem is only observed for targeting armv7 and armv6 as far as I can tell. I've never seen it for aarch64, neither my own builds nor when I looked at the package-building server history. At least that is what got me started. (I've since learned that qemu-user-static uses fork in place of a requested vfork.) My ktrace/kdump experiment yesterday showed something odd for the kevent that hangs in cmake: 93172 qemu-arm-static CALL kevent(0x3,0x7ffe7d40,0x2,0x7ffd7d40,0x400,0) 93172 qemu-arm-static STRU struct kevent[] = { { ident=6, filter=EVFILT_READ, flags=0x1, fflags=0, data=0, udata=0x0 } { ident=0x0, filter=, flags=0, fflags=0x8, data=0x1, udata=0x0 } } Note the 0x2 argument to kevent and the apparently-odd 2nd entry in the struct kevent[]. The kevent use is from cmake. So far I've not identified a signal being delivered at a time that would seem to me to be likely to contribute. (But this is not familiar code so my judgment is likely not the best.) Note: I normally run FreeBSD using a non-debug kernel, even when using head. (The kernel does have symbols.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-ports@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"
Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
[Using ktrace/kdump shows an apperent oddity in the kevent use that hang-up in cmake, not that I know it causes the hang-up.] On 2018-Dec-28, at 00:16, Mark Millard wrote: > [The historical notes are removed and replaced by partial trace > information from example hang-ups, not that I've figured out > what contributes yet.] > > I ran into the following while trying to get evidence > about the hang-up for an amd64->armv7 cross-build of > multimedia/gstreamer1-qt@qt5 . > > The following from trying to get evidence for the hang-up > via a manual run of "make multimedia/gstreamer1-qt FLAVOR=qt5” > in a poudriere bulk -i’s interactive mode for the context > that has the hang-up in normal poudriere-devel runs. > > > From top after the hang-up (to identify some context): > > 14528 root 2 520 100M24M0 kqread 11 0:00 0.00% > /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen > /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/. > 14527 root 2 52088M13M0 select 22 0:00 0.00% > /usr/local/bin/qemu-arm-static ninja -j1 -v all > > from ps -auxd as well (to identify more context): > > root 101140.0 0.0 10328 1756 1 I+J 13:47 0:00.01 | >`-- make FLAVOR=qt5 > root 145260.0 0.0 10204 1792 1 I+J 13:50 0:00.00 | > `-- /bin/sh -e -c (cd > /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! > /usr/bin/env QT_SELE > root 145270.0 0.0 90304 13084 1 I+J 13:50 0:00.09 | >`-- /usr/local/bin/qemu-arm-static ninja -j1 -v all > root 145280.0 0.0 102876 25060 1 IJ 13:50 0:00.12 | > `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E > cmake_autogen /wrkdirs/usr/ports/multimedia/g > > I had made a qemu-user-static that enabled do_strace when > it is used to run cmake or ninja. > > The only do_strace lines from qemu-arm-static running cmake > or ninja mentioning process 14528 are included in the sequence: > > (Before the below was a long list of "14527 fstatat” lines. > I’ll note that "'Unknown syscall 545” is from ppoll use.) > > 82400 sigprocmask(1,-1610620016,-191968524,-186261416,0,24) = 0 > 82400 sigaction(2,-1610620040,-191968596,-186261584,210460,0) = 0 > 82400 sigaction(15,-1610620040,-191968572,-186261584,210460,0) = 0 > 82400 sigaction(1,-1610620040,-191968548,-186261584,210460,0) = 0 > 82400 gettimeofday(-1610619984,0,4,-186261584,-1610619440,-1610619528) = 0 > 82400 gettimeofday(-1610619984,0,4,359949,1545969996,0) = 0 > 82400 gettimeofday(-1610620120,0,4,2,-184666112,-1610619520) = 0 > 82400 fstatat(-100,"elements/gstqtvideosink/CMakeFiles", 0x9fffe200, 0) = 0 > 82400 fstatat(-100,"elements/gstqtvideosink/gstqt5videosink_autogen", > 0x9fffe200, 0) = 0 > 82400 pipe2(-1610620176,0,-1610620108,0,-1610620120,167084) = 0 > 82400 fcntl(5,1,-1610620108,-185863932,-192200556,-1610620228) = 0 > 82400 fcntl(5,2,1,-185863932,-192200556,-1610620228) = 0 > 82400 vfork(0,66450,-186876196,-1610620184,-1610620240,0) = 82401 > 82400 close(6) = 0 > = 0 > 82400 Unknown syscall 545 > 82401 setpgid(0,0,-186876196,-1610620184,-1610620240,0) = 0 > 82401 sigprocmask(3,-191586912,0,-1610620184,-1610620240,0) = 0 > 82401 close(5) = 0 > 82401 open("/dev/null",0,0) = 5 > 82401 dup2(5,0,0,-1610620184,-1610620240,0) = 0 > 82401 close(5) = 0 > 82401 fcntl(0,2,0,-1610620184,-1610620240,0) = 0 > 82401 dup2(6,1,0,-1610620184,-1610620240,0) = 1 > 82401 fcntl(1,2,0,-1610620184,-1610620240,0) = 0 > 82401 dup2(6,2,0,-1610620184,-1610620240,0)82400 > sigpending(-1610620072,1,0,-191968524,0,0) = 0 > > The vfork then close(6) sequence for 82400 vs. the later > use of 6 in dup2 in 82401 may be rather odd. But it looks > like qemu-*-static uses do_freebsd_fork to implement > do_freebsd_vfork, despite reporting vfork before > calling do_freebsd_vfork. (Does the close(6) appear to > indicate a race for native operation of ninja for the > period when the address space is shared?) > > Ninja has Subprocess::Start code that has: > > #ifdef POSIX_SPAWN_USEVFORK > flags |= POSIX_SPAWN_USEVFORK; > #endif > > > if (posix_spawnattr_setflags(&attr, flags) != 0) >Fatal("posix_spawnattr_setflags: %s", strerror(errno)); > > const char* spawned_args[] = { "/bin/sh", "-c", command.c_str(), NULL }; > if (posix_spawn(&pid_, "/bin/sh", &action, &attr, > const_cast(spawned_args), environ) != 0) >Fatal("posix_spawn: %s", strerror(errno)); > > that is in use here. I think that this explains the vfork use. > > > It turns out that putting the hung-up build in the background > and then killing 82401 with the likes of kill -6 leads to more > output that had apparently been buffered. It shows the use of > the (amd64 native) /bin/sh that in turn leads to > /usr/local/bin/cmake via qemu-arm-static. /bin/sh, being > native, gets no do_strace output from qemu-arm-static. >