On Apr 22, 2024, at 3:26 AM, Alexander Leidinger <alexan...@leidinger.net> wrote:
> Hi, > > I see a higher failure rate of socket/network related stuff since a while. > Those failures are transient. Directly executing the same thing again may or > may not result in success/failure. I'm not able to reproduce this at will. > Sometimes they show up. > > Examples: > - poudriere runs with the sccache overlay (like ccache but also works for > rust) sometimes fail to create the communication socket and as such the build > fails. I have 3 different poudriere bulk runs after each other in my build > script, and when the first one fails, the second and third still run. If the > first fails due to the sccache issue, the second and 3rd may or may not fail. > Sometimes the first fails and the rest is ok. Sometimes all fail, and if I > then run one by hand it works (the script does the same as the manual run, > the script is simply a "for type in A B C; do; poudriere bulk -O sccache -j > $type -f ${type}.pkglist; done" which I execute from the same shell, and the > script doesn't do env-sanityzing). > - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx > (webmail service) -> php -> imap) sees intermittent issues sometimes. Opening > the same email directly again afterwards normally works. I've also seen > transient issues with pgp signing (webmail interface -> gnupg / gpg-agent on > the server), simply hitting send again after a failure works fine. > > Gleb, could this be related to the socket stuff you did 2 weeks ago? My world > is from 2024-04-17-112537. I do notice this since at least then, but I'm not > sure if they where there before that and I simply didn't notice them. They > are surely "new recently", that amount of issues I haven's seen in January. > The last two updates of current I did before the last one where on > 2024-03-31-120210 and 2024-04-08-112551. > > I could also imagine that some memory related transient failure could cause > this, but with >3 GB free I do not expect this. Important here may be that I > have https://reviews.freebsd.org/D40575 in my tree, which is memory related, > but it's only a metric to quantify memory fragmentation. > > Any ideas how to track this down more easily than running the entire > poudriere in ktrace (e.g. a hint/script which dtrace probes to use)? No answers, I'm afraid, just a "me too." I have the same problem as you describe when using ports-mgmt/sccache-overlay when building packages with Poudriere. In my case, I'm using FreeBSD 14-STABLE (stable/14-13952fbca). I actually stopped using ports-mgmt/sccache-overlay because it got to the point where it didn't work more often than it did. Then, a few months ago, I decided to start using it again on a whim and it worked reliably for me. Then, starting a few weeks ago, it has reverted to the behaviour you describe above. It is not as bad right now as it got when I quit using it. Now, sometimes it will fail, but it will succeed when re-running a "poudriere bulk" run. I'd love it to go back to when it was working 100% of the time. Cheers, Paul.