On 29/01/24 02:10, Warner Losh wrote:
On Sun, Jan 28, 2024 at 4:45 PM Nathan Reilly-list <li...@nreilly.com
<mailto:li...@nreilly.com>> wrote:
On 29 Jan 2024, at 8:43 am, Guido Falsi <m...@madpilot.net
<mailto:m...@madpilot.net>> wrote:
On 28/01/24 22:34, Guido Falsi wrote:
On 28/01/24 22:23, Warner Losh wrote:
On Sun, Jan 28, 2024, 12:38 PM Guido Falsi <m...@madpilot.net
<mailto:m...@madpilot.net> <mailto:m...@madpilot.net
<mailto:m...@madpilot.net>>> wrote:
On 28/01/24 15:15, Guido Falsi wrote:
[snip]
> Creating repository in /tmp/packages: 0%
>
BTW, forgot to mention last time this worked without issue
was around
20th December.
I think this is a bsd-user issue. There is a race somewhere in
that code that causes the hangs. I'd love a reproducible test
case that is somewhat smaller than python... there are bigger
races with the newer stuff and I've not had the time to chase it
there either. 😞
First of all thanks for your feedback. It encourages me having
someone else with better knowledge about this confirm that a race
condition is actually a possible cause!
Strange this has not been happening up to mid December.
My main and fully reproducible use case is actually mostly with pkg.
at the end of the run poudriere runs `pkg repo` to create the
meta files and sign the repo. It forks itself (ncpus + 2 I guess,
even forcing it to 1 worker I see three processes), and then
locks up, with all the processes stopping using CPU (ps output is
in my message)
I guess this can be reproduced with any poudriere repo with at
least more than ncpus packages in it. can also be reproduced
using `poudriere pkgclean -u <etc>`
If that does not work I'm not sure how to reproduce it in other
ways, but I can try writing some code mocking what pkg seems to
be doing, not an expert at such things, though.
In case it helps further norrow doen things, It looks like the
lockup is happening somewhere around here:
https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778
<https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778>
and/or in the pkg_create_repo_worker() function here:
https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341
<https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341>
(I'm trying to spare you the time needed to find the actual code
being executed, I guess you would have identified this in a few
minutes yourself, but I'm trying to make myself useful)
There appears to be a GitHub issue for poudriere with this, but
seems to be looking in another direction.
https://github.com/freebsd/poudriere/issues/1009
<https://github.com/freebsd/poudriere/issues/1009>
This one looks quite similar.
In my case the ports/pkg are aligned between host and jail, in fact I
have built them from the exact same git checkout.
I noticed pkg head has been converted to using pthreads instead of fork,
maybe that could help. I will make time to perform some testing.
There's a FreeBSD bug saying this is happening w/o qemu in the loop.
https://bugs.freebsd.org/276690 <https://bugs.freebsd.org/276690> at
least I think that's similar.
There are similarities but they are looking at the compiler, which has
no relation with pkg-repo getting stuck. That's what I'm concentrating
on at present.
Also the sporadic issue with python is not due to compiler, it is the
python binaries running during the build causing issues.
Warner
--
Guido Falsi <m...@madpilot.net>