On 28/01/24 22:34, Guido Falsi wrote:
On 28/01/24 22:23, Warner Losh wrote:
On Sun, Jan 28, 2024, 12:38 PM Guido Falsi <m...@madpilot.net
<mailto:m...@madpilot.net>> wrote:
On 28/01/24 15:15, Guido Falsi wrote:
> Hi all, again,
>
> I have some more findings about this, I'm top posting because the
old
> message is not really that much relevant anymore.
>
> I'm now running a machine with head (commit
> b32d49cfbaa0437d08e65e7cd7c82c5951b1a852 Jan 25th), poudriere
installed
> in it, machine is amd64, with an arm64 jail, 14.0-RELEASE,
installed
> from official distribution binaries (https download method), with
cross
> tools.
>
> To make sure everything is aligned I rebuild everything: updated
head,
> rebuild cross tools in the jail, recompiled all ports for the host
> architecture and force reinstalled them, especially
qemu-user-static,
> cleaned up all packages for the arm64 jail.
>
> If I missed something important please point it out.
>
> I have made some more tests and I'm getting python failures in
poudriere
> like the one described below from time to time (don't have hard
stats
> but feels like 50% chance). If I get past that it usually is
able to
> build all the not many packages, but locks up at:
>
> Creating repository in /tmp/packages: 0%
>
BTW, forgot to mention last time this worked without issue was around
20th December.
I think this is a bsd-user issue. There is a race somewhere in that
code that causes the hangs. I'd love a reproducible test case that is
somewhat smaller than python... there are bigger races with the newer
stuff and I've not had the time to chase it there either. 😞
First of all thanks for your feedback. It encourages me having someone
else with better knowledge about this confirm that a race condition is
actually a possible cause!
Strange this has not been happening up to mid December.
My main and fully reproducible use case is actually mostly with pkg.
at the end of the run poudriere runs `pkg repo` to create the meta files
and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it
to 1 worker I see three processes), and then locks up, with all the
processes stopping using CPU (ps output is in my message)
I guess this can be reproduced with any poudriere repo with at least
more than ncpus packages in it. can also be reproduced using `poudriere
pkgclean -u <etc>`
If that does not work I'm not sure how to reproduce it in other ways,
but I can try writing some code mocking what pkg seems to be doing, not
an expert at such things, though.
In case it helps further norrow doen things, It looks like the lockup is
happening somewhere around here:
https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778
and/or in the pkg_create_repo_worker() function here:
https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341
(I'm trying to spare you the time needed to find the actual code being
executed, I guess you would have identified this in a few minutes
yourself, but I'm trying to make myself useful)
--
Guido Falsi <m...@madpilot.net>