Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-28 Thread Mark Millard via freebsd-ports
[The historical notes are removed and replaced by partial trace
information from example hang-ups, not that I've figured out
what contributes yet.]

I ran into the following while trying to get evidence
about the hang-up for an amd64->armv7 cross-build of
multimedia/gstreamer1-qt@qt5 .

The following from trying to get evidence for the hang-up
via a manual run of "make multimedia/gstreamer1-qt FLAVOR=qt5”
in a poudriere bulk -i’s interactive mode for the context
that has the hang-up in normal poudriere-devel runs.


From top after the hang-up (to identify some context):

14528 root  2  520   100M24M0 kqread  11   0:00   0.00% 
/usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
14527 root  2  52088M13M0 select  22   0:00   0.00% 
/usr/local/bin/qemu-arm-static ninja -j1 -v all

from ps -auxd as well (to identify more context):

root   101140.0  0.0  10328  1756  1  I+J  13:47   0:00.01 |
 `-- make FLAVOR=qt5
root   145260.0  0.0  10204  1792  1  I+J  13:50   0:00.00 |
   `-- /bin/sh -e -c (cd 
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env 
QT_SELE
root   145270.0  0.0  90304 13084  1  I+J  13:50   0:00.09 |
 `-- /usr/local/bin/qemu-arm-static ninja -j1 -v all
root   145280.0  0.0 102876 25060  1  IJ   13:50   0:00.12 |
   `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
cmake_autogen /wrkdirs/usr/ports/multimedia/g

I had made a qemu-user-static that enabled do_strace when
it is used to run cmake or ninja.

The only do_strace lines from qemu-arm-static running cmake
or ninja mentioning process 14528 are included in the sequence:

(Before the below was a long list of "14527 fstatat” lines.
I’ll note that "'Unknown syscall 545” is from ppoll use.)

82400 sigprocmask(1,-1610620016,-191968524,-186261416,0,24) = 0
82400 sigaction(2,-1610620040,-191968596,-186261584,210460,0) = 0
82400 sigaction(15,-1610620040,-191968572,-186261584,210460,0) = 0
82400 sigaction(1,-1610620040,-191968548,-186261584,210460,0) = 0
82400 gettimeofday(-1610619984,0,4,-186261584,-1610619440,-1610619528) = 0
82400 gettimeofday(-1610619984,0,4,359949,1545969996,0) = 0
82400 gettimeofday(-1610620120,0,4,2,-184666112,-1610619520) = 0
82400 fstatat(-100,"elements/gstqtvideosink/CMakeFiles", 0x9fffe200, 0) = 0
82400 fstatat(-100,"elements/gstqtvideosink/gstqt5videosink_autogen", 
0x9fffe200, 0) = 0
82400 pipe2(-1610620176,0,-1610620108,0,-1610620120,167084) = 0
82400 fcntl(5,1,-1610620108,-185863932,-192200556,-1610620228) = 0
82400 fcntl(5,2,1,-185863932,-192200556,-1610620228) = 0
82400 vfork(0,66450,-186876196,-1610620184,-1610620240,0) = 82401
82400 close(6) = 0
 = 0
82400 Unknown syscall 545
82401 setpgid(0,0,-186876196,-1610620184,-1610620240,0) = 0
82401 sigprocmask(3,-191586912,0,-1610620184,-1610620240,0) = 0
82401 close(5) = 0
82401 open("/dev/null",0,0) = 5
82401 dup2(5,0,0,-1610620184,-1610620240,0) = 0
82401 close(5) = 0
82401 fcntl(0,2,0,-1610620184,-1610620240,0) = 0
82401 dup2(6,1,0,-1610620184,-1610620240,0) = 1
82401 fcntl(1,2,0,-1610620184,-1610620240,0) = 0
82401 dup2(6,2,0,-1610620184,-1610620240,0)82400 
sigpending(-1610620072,1,0,-191968524,0,0) = 0

The vfork then close(6) sequence for 82400 vs. the later
use of 6 in dup2 in 82401 may be rather odd. But it looks
like qemu-*-static uses do_freebsd_fork to implement
do_freebsd_vfork, despite reporting vfork before
calling do_freebsd_vfork. (Does the close(6) appear to
indicate a race for native operation of ninja for the
period when the address space is shared?)

Ninja has Subprocess::Start code that has:

#ifdef POSIX_SPAWN_USEVFORK
  flags |= POSIX_SPAWN_USEVFORK;
#endif


  if (posix_spawnattr_setflags(&attr, flags) != 0)
Fatal("posix_spawnattr_setflags: %s", strerror(errno));

  const char* spawned_args[] = { "/bin/sh", "-c", command.c_str(), NULL };
  if (posix_spawn(&pid_, "/bin/sh", &action, &attr,
  const_cast(spawned_args), environ) != 0)
Fatal("posix_spawn: %s", strerror(errno));

that is in use here. I think that this explains the vfork use.


It turns out that putting the hung-up build in the background
and then killing 82401 with the likes of kill -6 leads to more
output that had apparently been buffered. It shows the use of
the (amd64 native) /bin/sh that in turn leads to
/usr/local/bin/cmake via qemu-arm-static. /bin/sh, being
native, gets no do_strace output from qemu-arm-static.

82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0
82400 read(5,0x9fffd368,4096) = 58
82400 Unknown syscall 545
82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0
82400 read(5,0x9fffd368,4096) = 0
82400 close(5) = 0
82400 wait4(82401,-1610620004,0,0,-191968640,0) = 82401
82400 mmap(0,86016,3,201330690,-1,-1610620169) = 0xf4777000
82400 gettimeofday(-1610620224,0,4,-161061994

FreeBSD ports you maintain which are out of date

2018-12-28 Thread portscout
Dear port maintainer,

The portscout new distfile checker has detected that one or more of your
ports appears to be out of date. Please take the opportunity to check
each of the ports listed below, and if possible and appropriate,
submit/commit an update. If any ports have already been updated, you can
safely ignore the entry.

You will not be e-mailed again for any of the port/version combinations
below.

Full details can be found at the following URL:
http://portscout.freebsd.org/po...@freebsd.org.html


Port| Current version | New version
+-+
x11-toolkits/gtkdatabox | 0.9.3.0 | 0.9.3.1
+-+


If any of the above results are invalid, please check the following page
for details on how to improve portscout's detection and selection of
distfiles on a per-port basis:

http://portscout.freebsd.org/info/portscout-portconfig.txt

Thanks.
___
freebsd-ports@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-28 Thread Mark Millard via freebsd-ports



On 2018-Dec-28, at 05:13, Michal Meloun  wrote:

> Mark,
> this is known problem with qemu-user-static.
> Emulation of every single interruptible syscall is broken by design (it
> have signal related races). Theses races cannot be solved without major
> rewrite of syscall emulation code.
> Unfortunately, nobody actively works on this, I think.
> 

Thanks for the note setting some expectations.

On the evidence that I have I expect that more is going on than that:

A) The hang-up always happens and always in the same place. So
it would appear that no race is involved.

B) (A) is true even for varying the number of builders in parallel
(so other builds also happening) and the number of jobs allowed per
builder. It also fails for only one builder allowed only one process.
(I get traces from that last kind of context.)

C) The problem started on the package-building servers for armv7
and armv6 without qemu-user-static having an update (FreeBSD and
cmake had updates, for example).

D) The problem is only observed for targeting armv7 and armv6 as
far as I can tell. I've never seen it for aarch64, neither my
own builds nor when I looked at the package-building server
history.

At least that is what got me started. (I've since learned that
qemu-user-static uses fork in place of a requested vfork.)

My ktrace/kdump experiment yesterday showed something odd for the
kevent that hangs in cmake:

93172 qemu-arm-static CALL  
kevent(0x3,0x7ffe7d40,0x2,0x7ffd7d40,0x400,0)
93172 qemu-arm-static STRU  struct kevent[] = { { ident=6, filter=EVFILT_READ, 
flags=0x1, fflags=0, data=0, udata=0x0 }
 { ident=0x0, filter=, flags=0, fflags=0x8, 
data=0x1, udata=0x0 } }

Note the 0x2 argument to kevent and the apparently-odd 2nd entry in the struct
kevent[]. The kevent use is from cmake.

So far I've not identified a signal being delivered at a time that would seem
to me to be likely to contribute. (But this is not familiar code so my judgment
is likely not the best.)

Note: I normally run FreeBSD using a non-debug kernel, even when using
head. (The kernel does have symbols.)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-ports@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

2018-12-28 Thread Mark Millard via freebsd-ports
[Using ktrace/kdump shows an apperent oddity in the kevent use that
hang-up in cmake, not that I know it causes the hang-up.]

On 2018-Dec-28, at 00:16, Mark Millard  wrote:

> [The historical notes are removed and replaced by partial trace
> information from example hang-ups, not that I've figured out
> what contributes yet.]
> 
> I ran into the following while trying to get evidence
> about the hang-up for an amd64->armv7 cross-build of
> multimedia/gstreamer1-qt@qt5 .
> 
> The following from trying to get evidence for the hang-up
> via a manual run of "make multimedia/gstreamer1-qt FLAVOR=qt5”
> in a poudriere bulk -i’s interactive mode for the context
> that has the hang-up in normal poudriere-devel runs.
> 
> 
> From top after the hang-up (to identify some context):
> 
> 14528 root  2  520   100M24M0 kqread  11   0:00   0.00% 
> /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen 
> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
> 14527 root  2  52088M13M0 select  22   0:00   0.00% 
> /usr/local/bin/qemu-arm-static ninja -j1 -v all
> 
> from ps -auxd as well (to identify more context):
> 
> root   101140.0  0.0  10328  1756  1  I+J  13:47   0:00.01 |  
>`-- make FLAVOR=qt5
> root   145260.0  0.0  10204  1792  1  I+J  13:50   0:00.00 |  
>  `-- /bin/sh -e -c (cd 
> /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! 
> /usr/bin/env QT_SELE
> root   145270.0  0.0  90304 13084  1  I+J  13:50   0:00.09 |  
>`-- /usr/local/bin/qemu-arm-static ninja -j1 -v all
> root   145280.0  0.0 102876 25060  1  IJ   13:50   0:00.12 |  
>  `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E 
> cmake_autogen /wrkdirs/usr/ports/multimedia/g
> 
> I had made a qemu-user-static that enabled do_strace when
> it is used to run cmake or ninja.
> 
> The only do_strace lines from qemu-arm-static running cmake
> or ninja mentioning process 14528 are included in the sequence:
> 
> (Before the below was a long list of "14527 fstatat” lines.
> I’ll note that "'Unknown syscall 545” is from ppoll use.)
> 
> 82400 sigprocmask(1,-1610620016,-191968524,-186261416,0,24) = 0
> 82400 sigaction(2,-1610620040,-191968596,-186261584,210460,0) = 0
> 82400 sigaction(15,-1610620040,-191968572,-186261584,210460,0) = 0
> 82400 sigaction(1,-1610620040,-191968548,-186261584,210460,0) = 0
> 82400 gettimeofday(-1610619984,0,4,-186261584,-1610619440,-1610619528) = 0
> 82400 gettimeofday(-1610619984,0,4,359949,1545969996,0) = 0
> 82400 gettimeofday(-1610620120,0,4,2,-184666112,-1610619520) = 0
> 82400 fstatat(-100,"elements/gstqtvideosink/CMakeFiles", 0x9fffe200, 0) = 0
> 82400 fstatat(-100,"elements/gstqtvideosink/gstqt5videosink_autogen", 
> 0x9fffe200, 0) = 0
> 82400 pipe2(-1610620176,0,-1610620108,0,-1610620120,167084) = 0
> 82400 fcntl(5,1,-1610620108,-185863932,-192200556,-1610620228) = 0
> 82400 fcntl(5,2,1,-185863932,-192200556,-1610620228) = 0
> 82400 vfork(0,66450,-186876196,-1610620184,-1610620240,0) = 82401
> 82400 close(6) = 0
> = 0
> 82400 Unknown syscall 545
> 82401 setpgid(0,0,-186876196,-1610620184,-1610620240,0) = 0
> 82401 sigprocmask(3,-191586912,0,-1610620184,-1610620240,0) = 0
> 82401 close(5) = 0
> 82401 open("/dev/null",0,0) = 5
> 82401 dup2(5,0,0,-1610620184,-1610620240,0) = 0
> 82401 close(5) = 0
> 82401 fcntl(0,2,0,-1610620184,-1610620240,0) = 0
> 82401 dup2(6,1,0,-1610620184,-1610620240,0) = 1
> 82401 fcntl(1,2,0,-1610620184,-1610620240,0) = 0
> 82401 dup2(6,2,0,-1610620184,-1610620240,0)82400 
> sigpending(-1610620072,1,0,-191968524,0,0) = 0
> 
> The vfork then close(6) sequence for 82400 vs. the later
> use of 6 in dup2 in 82401 may be rather odd. But it looks
> like qemu-*-static uses do_freebsd_fork to implement
> do_freebsd_vfork, despite reporting vfork before
> calling do_freebsd_vfork. (Does the close(6) appear to
> indicate a race for native operation of ninja for the
> period when the address space is shared?)
> 
> Ninja has Subprocess::Start code that has:
> 
> #ifdef POSIX_SPAWN_USEVFORK
>  flags |= POSIX_SPAWN_USEVFORK;
> #endif
> 
> 
>  if (posix_spawnattr_setflags(&attr, flags) != 0)
>Fatal("posix_spawnattr_setflags: %s", strerror(errno));
> 
>  const char* spawned_args[] = { "/bin/sh", "-c", command.c_str(), NULL };
>  if (posix_spawn(&pid_, "/bin/sh", &action, &attr,
>  const_cast(spawned_args), environ) != 0)
>Fatal("posix_spawn: %s", strerror(errno));
> 
> that is in use here. I think that this explains the vfork use.
> 
> 
> It turns out that putting the hung-up build in the background
> and then killing 82401 with the likes of kill -6 leads to more
> output that had apparently been buffered. It shows the use of
> the (amd64 native) /bin/sh that in turn leads to
> /usr/local/bin/cmake via qemu-arm-static. /bin/sh, being
> native, gets no do_strace output from qemu-arm-static.
>