On Apr 22, 2024, at 3:26 AM, Alexander Leidinger <[email protected]>
wrote:
> Hi,
>
> I see a higher failure rate of socket/network related stuff since a while.
> Those failures are transient. Directly executing the same thing again may or
> may not result in success/failure. I'm not able to reproduce this at will.
> Sometimes they show up.
>
> Examples:
> - poudriere runs with the sccache overlay (like ccache but also works for
> rust) sometimes fail to create the communication socket and as such the build
> fails. I have 3 different poudriere bulk runs after each other in my build
> script, and when the first one fails, the second and third still run. If the
> first fails due to the sccache issue, the second and 3rd may or may not fail.
> Sometimes the first fails and the rest is ok. Sometimes all fail, and if I
> then run one by hand it works (the script does the same as the manual run,
> the script is simply a "for type in A B C; do; poudriere bulk -O sccache -j
> $type -f ${type}.pkglist; done" which I execute from the same shell, and the
> script doesn't do env-sanityzing).
> - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx
> (webmail service) -> php -> imap) sees intermittent issues sometimes. Opening
> the same email directly again afterwards normally works. I've also seen
> transient issues with pgp signing (webmail interface -> gnupg / gpg-agent on
> the server), simply hitting send again after a failure works fine.
>
> Gleb, could this be related to the socket stuff you did 2 weeks ago? My world
> is from 2024-04-17-112537. I do notice this since at least then, but I'm not
> sure if they where there before that and I simply didn't notice them. They
> are surely "new recently", that amount of issues I haven's seen in January.
> The last two updates of current I did before the last one where on
> 2024-03-31-120210 and 2024-04-08-112551.
>
> I could also imagine that some memory related transient failure could cause
> this, but with >3 GB free I do not expect this. Important here may be that I
> have https://reviews.freebsd.org/D40575 in my tree, which is memory related,
> but it's only a metric to quantify memory fragmentation.
>
> Any ideas how to track this down more easily than running the entire
> poudriere in ktrace (e.g. a hint/script which dtrace probes to use)?
No answers, I'm afraid, just a "me too."
I have the same problem as you describe when using ports-mgmt/sccache-overlay
when building packages with Poudriere. In my case, I'm using FreeBSD 14-STABLE
(stable/14-13952fbca).
I actually stopped using ports-mgmt/sccache-overlay because it got to the point
where it didn't work more often than it did. Then, a few months ago, I decided
to start using it again on a whim and it worked reliably for me. Then,
starting a few weeks ago, it has reverted to the behaviour you describe above.
It is not as bad right now as it got when I quit using it. Now, sometimes it
will fail, but it will succeed when re-running a "poudriere bulk" run.
I'd love it to go back to when it was working 100% of the time.
Cheers,
Paul.