On 29/01/24 09:26, Guido Falsi wrote:
On 29/01/24 02:10, Warner Losh wrote:


On Sun, Jan 28, 2024 at 4:45 PM Nathan Reilly-list <li...@nreilly.com <mailto:li...@nreilly.com>> wrote:



    On 29 Jan 2024, at 8:43 am, Guido Falsi <m...@madpilot.net
    <mailto:m...@madpilot.net>> wrote:
    On 28/01/24 22:34, Guido Falsi wrote:
    On 28/01/24 22:23, Warner Losh wrote:
    On Sun, Jan 28, 2024, 12:38 PM Guido Falsi <m...@madpilot.net
    <mailto:m...@madpilot.net> <mailto:m...@madpilot.net
    <mailto:m...@madpilot.net>>> wrote:

        On 28/01/24 15:15, Guido Falsi wrote:
        [snip]
         > Creating repository in /tmp/packages:   0%
         >

        BTW, forgot to mention last time this worked without issue
    was around
        20th December.


    I think this is a bsd-user issue. There is a race somewhere in
    that code that causes the hangs. I'd love a reproducible test
    case that is somewhat smaller than python... there are bigger
    races with the newer stuff and I've not had the time to chase it
    there either. 😞
    First of all thanks for your feedback. It encourages me having
    someone else with better knowledge about this confirm that a race
    condition is actually a possible cause!
    Strange this has not been happening up to mid December.
    My main and fully reproducible use case is actually mostly with pkg.
    at the end of the run poudriere runs `pkg repo` to create the
    meta files and sign the repo. It forks itself (ncpus + 2 I guess,
    even forcing it to 1 worker I see three processes), and then
    locks up, with all the processes stopping using CPU (ps output is
    in my message)
    I guess this can be reproduced with any poudriere repo with at
    least more than ncpus packages in it. can also be reproduced
    using `poudriere pkgclean -u <etc>`
    If that does not work I'm not sure how to reproduce it in other
    ways, but I can try  writing some code mocking what pkg seems to
    be doing, not an expert at such things, though.

    In case it helps further norrow doen things, It looks like the
    lockup is happening somewhere around here:

https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778>

    and/or in the pkg_create_repo_worker() function here:

https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341>


    (I'm trying to spare you the time needed to find the actual code
    being executed, I guess you would have identified this in a few
    minutes yourself, but I'm trying to make myself useful)


    There appears to be a GitHub issue for poudriere with this, but
    seems to be looking in another direction.

    https://github.com/freebsd/poudriere/issues/1009
    <https://github.com/freebsd/poudriere/issues/1009>


This one looks quite similar.

In my case the ports/pkg are aligned between host and jail, in fact I have built them from the exact same git checkout.

I noticed pkg head has been converted to using pthreads instead of fork, maybe that could help. I will make time to perform some testing.

Thanks for pointing me here, it looks like this was "it", in that by fixing this issue it uses native pkg-static, and sidesteps the issue.


Unluckily there ARE qemu races and lockups that prevent arm64 pkg-static binary to be correctly emulated by qemu-user-static. such conditions also cause sporadic failures in some ports being built.

I filed a PR with a fix for that issue:

https://github.com/freebsd/poudriere/pull/1115


--
Guido Falsi <m...@madpilot.net>


Reply via email to