On Apr 22, 2024, at 3:26 AM, Alexander Leidinger <alexan...@leidinger.net> 
wrote:


> Hi,
> 
> I see a higher failure rate of socket/network related stuff since a while. 
> Those failures are transient. Directly executing the same thing again may or 
> may not result in success/failure. I'm not able to reproduce this at will. 
> Sometimes they show up.
> 
> Examples:
> - poudriere runs with the sccache overlay (like ccache but also works for 
> rust) sometimes fail to create the communication socket and as such the build 
> fails. I have 3 different poudriere bulk runs after each other in my build 
> script, and when the first one fails, the second and third still run. If the 
> first fails due to the sccache issue, the second and 3rd may or may not fail. 
> Sometimes the first fails and the rest is ok. Sometimes all fail, and if I 
> then run one by hand it works (the script does the same as the manual run, 
> the script is simply a "for type in A B C; do; poudriere bulk -O sccache -j 
> $type -f  ${type}.pkglist; done" which I execute from the same shell, and the 
> script doesn't do env-sanityzing).
> - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx 
> (webmail service) -> php -> imap) sees intermittent issues sometimes. Opening 
> the same email directly again afterwards normally works. I've also seen 
> transient issues with pgp signing (webmail interface -> gnupg / gpg-agent on 
> the server), simply hitting send again after a failure works fine.
> 
> Gleb, could this be related to the socket stuff you did 2 weeks ago? My world 
> is from 2024-04-17-112537. I do notice this since at least then, but I'm not 
> sure if they where there before that and I simply didn't notice them. They 
> are surely "new recently", that amount of issues I haven's seen in January. 
> The last two updates of current I did before the last one where on 
> 2024-03-31-120210 and 2024-04-08-112551.
> 
> I could also imagine that some memory related transient failure could cause 
> this, but with >3 GB free I do not expect this. Important here may be that I 
> have https://reviews.freebsd.org/D40575 in my tree, which is memory related, 
> but it's only a metric to quantify memory fragmentation.
> 
> Any ideas how to track this down more easily than running the entire 
> poudriere in ktrace (e.g. a hint/script which dtrace probes to use)?


No answers, I'm afraid, just a "me too."

I have the same problem as you describe when using ports-mgmt/sccache-overlay 
when building packages with Poudriere.  In my case, I'm using FreeBSD 14-STABLE 
(stable/14-13952fbca).

I actually stopped using ports-mgmt/sccache-overlay because it got to the point 
where it didn't work more often than it did.  Then, a few months ago, I decided 
to start using it again on a whim and it worked reliably for me.  Then, 
starting a few weeks ago, it has reverted to the behaviour you describe above.  
It is not as bad right now as it got when I quit using it.  Now, sometimes it 
will fail, but it will succeed when re-running a "poudriere bulk" run.

I'd love it to go back to when it was working 100% of the time.

Cheers,

Paul.


Reply via email to