Hi Ludovic and Artyom, Ludovic Courtès <l...@gnu.org> writes:
> Ludovic Courtès <l...@gnu.org> skribis: > >> So we have the finalization thread closing a channel of session >> 0x12a4b20 (which causes a write on the channel), and the main thread >> writing to a channel of that same session. This is exactly what I >> described at <https://issues.guix.gnu.org/26976#11>: >> >> AIUI, that means there’s one output compression buffer per session, >> and it’s not thread-safe (in Guile 2.2 finalizers are called from a >> separate thread.) >> >> I think the fix, in Guile-SSH, is to associate each libssh object >> (session, channel, etc.) with a mutex, and to protect all uses of the >> libssh object by that mutex. >> >> Artyom, WDYT? Do you think you could take a look into that? >> >> In the meantime, I’ll look for the origin of the channel port that’s not >> explicitly closed and see if we can work around it. > > I’ve pushed this change on our side to explicitly close channels and > sessions: > > > https://git.savannah.gnu.org/cgit/guix.git/commit/?id=61fe9ced7da7eefceb931af0cb7363b721f5bdd6 > > This workaround is similar to that of 2017: > > > https://git.savannah.gnu.org/cgit/guix.git/commit/?id=8e469b67f95cfe5b95405b503b8ee315fdf8ce66 > > It’s really just a workaround so I think we should fix the core issue in > Guile-SSH (or libssh) so it doesn’t pop up again next month—it’s hard to > ensure code that opens a channel explicitly closes it. Do you think the issue lies in guile-ssh or in libssh itself? Sorry for not having caught these problems earlier; it seemed to work reliably when I last tested it. > Anyway, I would welcome tests using ‘guix copy’, ‘guix deploy’, and > offloading. (For offloading, make sure to run the daemon from your > build tree.) While attempting to use offload on the core-updates branch, I encountered stalls and file errors, but with your patch it seems to work reliable (it's been offloading builds for the last 15 minutes or so without interruption). So your workaround fixes seem to work as intended. I also agree that it'd be much nicer and future proof if we could fix the root issue. Thanks! Maxim