On 29/01/24 16:53, Warner Losh wrote:
On Mon, Jan 29, 2024, 8:48 AM Guido Falsi <m...@madpilot.net
<mailto:m...@madpilot.net>> wrote:
On 29/01/24 09:26, Guido Falsi wrote:
> On 29/01/24 02:10, Warner Losh wrote:
>>
>>
>> On Sun, Jan 28, 2024 at 4:45 PM Nathan Reilly-list
<li...@nreilly.com <mailto:li...@nreilly.com>
>> <mailto:li...@nreilly.com <mailto:li...@nreilly.com>>> wrote:
>>
>>
>>
>>> On 29 Jan 2024, at 8:43 am, Guido Falsi <m...@madpilot.net
<mailto:m...@madpilot.net>
>>> <mailto:m...@madpilot.net <mailto:m...@madpilot.net>>> wrote:
>>> On 28/01/24 22:34, Guido Falsi wrote:
>>>> On 28/01/24 22:23, Warner Losh wrote:
>>>>> On Sun, Jan 28, 2024, 12:38 PM Guido Falsi
<m...@madpilot.net <mailto:m...@madpilot.net>
>>>>> <mailto:m...@madpilot.net <mailto:m...@madpilot.net>>
<mailto:m...@madpilot.net <mailto:m...@madpilot.net>
>>>>> <mailto:m...@madpilot.net <mailto:m...@madpilot.net>>>> wrote:
>>>>>
>>>>> On 28/01/24 15:15, Guido Falsi wrote:
>>>>> [snip]
>>>>> > Creating repository in /tmp/packages: 0%
>>>>> >
>>>>>
>>>>> BTW, forgot to mention last time this worked without
issue
>>>>> was around
>>>>> 20th December.
>>>>>
>>>>>
>>>>> I think this is a bsd-user issue. There is a race
somewhere in
>>>>> that code that causes the hangs. I'd love a reproducible test
>>>>> case that is somewhat smaller than python... there are bigger
>>>>> races with the newer stuff and I've not had the time to
chase it
>>>>> there either. 😞
>>>> First of all thanks for your feedback. It encourages me having
>>>> someone else with better knowledge about this confirm that
a race
>>>> condition is actually a possible cause!
>>>> Strange this has not been happening up to mid December.
>>>> My main and fully reproducible use case is actually mostly
with
>>>> pkg.
>>>> at the end of the run poudriere runs `pkg repo` to create the
>>>> meta files and sign the repo. It forks itself (ncpus + 2 I
guess,
>>>> even forcing it to 1 worker I see three processes), and then
>>>> locks up, with all the processes stopping using CPU (ps
output is
>>>> in my message)
>>>> I guess this can be reproduced with any poudriere repo with at
>>>> least more than ncpus packages in it. can also be reproduced
>>>> using `poudriere pkgclean -u <etc>`
>>>> If that does not work I'm not sure how to reproduce it in
other
>>>> ways, but I can try writing some code mocking what pkg
seems to
>>>> be doing, not an expert at such things, though.
>>>
>>> In case it helps further norrow doen things, It looks like the
>>> lockup is happening somewhere around here:
>>>
>>>
>>>
https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778
<https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778>
<https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778
<https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778>>
>>>
>>> and/or in the pkg_create_repo_worker() function here:
>>>
>>>
>>>
https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341
<https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341>
<https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341
<https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341>>
>>>
>>>
>>> (I'm trying to spare you the time needed to find the actual
code
>>> being executed, I guess you would have identified this in a few
>>> minutes yourself, but I'm trying to make myself useful)
>>
>>
>> There appears to be a GitHub issue for poudriere with this, but
>> seems to be looking in another direction.
>>
>> https://github.com/freebsd/poudriere/issues/1009
<https://github.com/freebsd/poudriere/issues/1009>
>> <https://github.com/freebsd/poudriere/issues/1009
<https://github.com/freebsd/poudriere/issues/1009>>
>>
>
> This one looks quite similar.
>
> In my case the ports/pkg are aligned between host and jail, in
fact I
> have built them from the exact same git checkout.
>
> I noticed pkg head has been converted to using pthreads instead
of fork,
> maybe that could help. I will make time to perform some testing.
Thanks for pointing me here, it looks like this was "it", in that by
fixing this issue it uses native pkg-static, and sidesteps the issue.
Unluckily there ARE qemu races and lockups that prevent arm64
pkg-static
binary to be correctly emulated by qemu-user-static. such conditions
also cause sporadic failures in some ports being built.
I filed a PR with a fix for that issue:
https://github.com/freebsd/poudriere/pull/1115
<https://github.com/freebsd/poudriere/pull/1115>
Ok. This dodges the problem. But it papers over things.
Definitely, but this is actually also what was happening in the past. It
stopped using native (host) pkg-static due to the pkg port gaining a
PORTREVISION, which caused the same version check to fail.
I agree the underlying issue should be fixed.
Any chance you could give me the state of pkg before + the package added
as a test case for qemu?
Not sure I understand what you are asking for, can you elaborate?
What I did was run poudriere asking it to compile a few packages, the
lockup, when trying to use target arch pkg-static via qemu-user, is
reproducible 100% in my experience. It does not really depend on number
of packages. I get it by starting with an empty build.
I'm building these packages (and obviously their dependencies):
dns/unbound
net/kea
sysutils/tmux
(I guess building only tmux could suffice)
With poudriere you can get it to use target arch pkg-static by modifying
/usr/local/share/poudriere/common.sh function ensure_pkg_installed,
making sure the check here fails:
https://github.com/freebsd/poudriere/blob/e00503d846dc7a3b661aac84a6657f15e0f4b702/src/share/poudriere/common.sh#L5687
--
Guido Falsi <m...@madpilot.net>