Hello again! I believe I found the source of the problem. This is an
edge-case, so I hope my explanation is sufficient :)
- Guacd creates 2 separate libssh2 sessions: one for SSH and one for SFTP
(two separate unix sockets), pretty sure this commit
<https://github.com/apache/guacamole-server/commit/b9d9a9d3247f6f5f1079582d05204adc91fd4ebb>
is
the source of this behavior.
- The guacd keepalive mechanism only sends keep-alive requests on the SSH
session.
But as Nick said, SFTP relies on the SSH connection, meaning: as long as
the SSH session remains active, so does the SFTP session, which is
absolutely correct.

So why did my SFTP session disconnect when idle for a couple of minutes?
Well, I used an Azure VM as a guacamole connection, and Azure VM's come
with an openssh server configuration called "*ClientAliveInterval*", which
if specified runs keepalive checks on connected clients, and if they fail
to respond, disconnect them.
In our case (guacamole usages), the *ClientAliveInterval *keepalive packets
are transmitted to both sessions - the SSH & SFTP.
Now here is a small part I'm not entirely sure about, but it seems guacd
fails to respond to these keepalive requests. For the sake of the argument,
let's assume that is the case.
The SSH session remains active because guacd sends its own keepalives, and
receives appropriate responses.
Because guacd doesn't send any keepalive requests on the SFTP session,
after a couple of failed keepalives *sent from the target server *(in my
case, the Azure VM), the SFTP session disconnects (openssh closes the unix
socket), and the SSH session remains fully functional.

I tried two things to ensure this is actually the problem:
1. I removed the "*ClientAliveInterval*" configuration and the SFTP session
remained functional
2. With the "*ClientAliveInterval*" present, I patched guacd here
<https://github.com/apache/guacamole-server/blob/90e15cb70635e8d13ebdefec9262701d2179983a/src/protocols/ssh/ssh.c#L439>:
adding another call to *libssh2_keepalive_send* onto the SFTP session (
*ssh_client->sftp_session->session*). I sniffed both session ports, and lo
and behold, there were keepalive packets on both ports, and the SFTP
session remained functional after idle periods.

 So first, I'd love some feedback on this, I hope I'm not entirely
off-track and that I made some sense here :)
If so, I'd love some guidelines as to how to proceed regarding opening a
formal issue/submitting a PR.
Thanks for taking the time to read this!

On Mon, Nov 1, 2021 at 11:56 PM Nick Couchman <[email protected]> wrote:

> On Sun, Oct 31, 2021 at 3:31 AM Shai Roemi <[email protected]> wrote:
>
>> Sorry for the late reply
>> I did run guacd in debug mode, no more informative logs appear. As I said
>> earlier, I traced the error to this function
>> <https://www.libssh2.org/libssh2_sftp_open_ex.html>. The documentation
>> says NULL is returned on error, although the logs don't print the actual
>> error code.
>>
>
> My guess is that the function you mention only gets run once, upon initial
> connection of the SFTP channel. So, you probably won't get an error out of
> that, since you indicate that the SFTP session initially works fine but
> stops working somewhere along the way. So, I would guess that function call
> succeeds and you won't see any errors there. Of course, I could be wrong
> about that, just my guess.
>
>
>> My next step is to patch guacd to print it, hoping it'll give me more
>> information to find the source of the problem.
>> Regarding network issues, that was my first guess as well. I reviewed the
>> traffic and I see SSH-v2 keepalives transmitting and ack'ing successfully
>> at the correct intervals (according to the connection configurations).
>>
>>
> Yes, the SSH keep-alives will be there, and, since the SFTP traffic uses
> the same SSH connection as the terminal, I would expect this.
>
>
>> I very much doubt this is a guacamole bug, but If you have any other
>> ideas I'd love to hear them!
>> I'll update this thread when I find out anything new, hopefully it'll
>> help someone someday :)
>>
>>
> Yeah, it's definitely an odd problem. I'm certainly not saying that it
> absolutely couldn't be a bug in Guacamole - there certainly could be some
> corner-case it can't deal with, or even in libssh2 or something like that.
> But it almost seems like either the network is shutting down parts of the
> SSH connection, or the SSH server itself. Very strange - if you're able to
> track it down I will definitely be interested to hear what you come up with!
>
> -Nick
>
>>

Reply via email to