--- Begin Message ---
> A little bit of analysis of why this is needed and okay would be great to 
> have here in the commit message ;)

Thanks for the feedback, Fabian! First time with git-email..

I'm adding details with logs here, and a detailed description of the
issue & solution in the commit.

> nit: the if is not needed, as `stoptls` only does anything anyway if $hdl has 
> a FH and is a TLS-enabled handle ;)

Thanks! :D

I added some debugging to AnyEvent.pm. Here is a curl request that returns 200 
OK:

1741209019.890976 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): 
ACCEPT FH10 CONN1
1741209019.891069 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): 
Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:10:19 -0700] "GET 
/api2/json/cluster HTTP/1.1" 200 360
1741209019.916084 worker[1636452]: PVE::APIServer::AnyEvent +423: 
finish_response: KEEPALIVE 99
1741209019.916693 worker[1636452]: AnyEvent::Handle +1350: __ANON__: EOF 
accept_connections()
1741209019.916716 worker[1636452]: PVE::APIServer::AnyEvent +1951: (eval): >>> 
accept_connections().on_eof
1741209019.916756 worker[1636452]: PVE::APIServer::AnyEvent +1954: 
client_do_disconnect: START client_do_disconnect()
1741209019.916827 worker[1636452]: PVE::APIServer::AnyEvent +1954: 
client_do_disconnect: close connection AnyEvent::Handle=HASH(0x595788db52a8)    
                                                                                
           
1741209019.916845 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
shutdown($hdl->{fh}, 1)
1741209019.916889 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
ok
1741209019.916920 worker[1636452]: PVE::APIServer::AnyEvent +1954: 
client_do_disconnect: CLOSE FH10 CONN0

A curl request that returns 400: 

1741209023.423134 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): 
ACCEPT FH10 CONN1
1741209023.423207 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): 
Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:10:23 -0700] "GET 
/api2/json/cluster/nextid?vmid=106 HTTP/1.1" 400 55                             
                                                                                
  
1741209023.439439 worker[1636452]: PVE::APIServer::AnyEvent +423: 
finish_response: DRAIN finish_response()
1741209023.439486 worker[1636452]: PVE::APIServer::AnyEvent +219: (eval): >>> 
finish_response().on_drain
1741209023.439547 worker[1636452]: PVE::APIServer::AnyEvent +222: 
client_do_disconnect: START client_do_disconnect()
1741209023.439591 worker[1636452]: PVE::APIServer::AnyEvent +222: 
client_do_disconnect: close connection AnyEvent::Handle=HASH(0x595788de2318)    
                                                                                
            
1741209023.439610 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
shutdown($hdl->{fh}, 1)
1741209023.439648 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
ok
1741209023.439682 worker[1636452]: PVE::APIServer::AnyEvent +222: 
client_do_disconnect: CLOSE FH10 CONN0

Requests returning 400 send "Connection: close" and tear down the socket 
immediately, resulting in an unexpected EOF at the client. Requests returning 
200 send "Connection: keep-alive" and have a 5 second timeout.

If we make a request but let the connection sit idle, the 5 second timeout is 
reached, and we receive unexpected EOF again:

1741209026.694101 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): 
ACCEPT FH10 CONN1
1741209026.694173 worker[1636452]: PVE::APIServer::AnyEvent +1910: (eval): 
Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:10:26 -0700] "GET 
/api2/json/cluster HTTP/1.1" 200 360
1741209026.698288 worker[1636452]: PVE::APIServer::AnyEvent +423: 
finish_response: KEEPALIVE 99
1741209031.699939 worker[1636452]: AnyEvent::Handle +655: __ANON__: ERROR 
accept_connections()
1741209031.700154 worker[1636452]: PVE::APIServer::AnyEvent +1962: (eval): >>> 
accept_connections().on_error
1741209031.700186 worker[1636452]: PVE::APIServer::AnyEvent +1969: 
client_do_disconnect: START client_do_disconnect()
1741209031.700212 worker[1636452]: PVE::APIServer::AnyEvent +1969: 
client_do_disconnect: close connection AnyEvent::Handle=HASH(0x595788de1e68)
1741209031.700240 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
shutdown($hdl->{fh}, 1)
1741209031.700362 worker[1636452]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
ok
1741209031.700408 worker[1636452]: PVE::APIServer::AnyEvent +1969: 
client_do_disconnect: CLOSE FH10 CONN0

To fix this, we can send a TLS close notify before shutting down the socket.

If we do stoptls() before clearing the handlers, however, we enter 
client_do_disconnect() again:

1741209459.761847 worker[1639023]: PVE::APIServer::AnyEvent +1910: (eval): 
ACCEPT FH10 CONN1
1741209459.761948 worker[1639023]: PVE::APIServer::AnyEvent +1910: (eval): 
Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:17:39 -0700] "GET 
/api2/json/cluster/nextid?vmid=106 HTTP/1.1" 400 55                             
                                                                                
  
1741209459.787932 worker[1639023]: PVE::APIServer::AnyEvent +423: 
finish_response: DRAIN finish_response()
1741209459.788003 worker[1639023]: PVE::APIServer::AnyEvent +219: (eval): >>> 
finish_response().on_drain
1741209459.788030 worker[1639023]: PVE::APIServer::AnyEvent +222: 
client_do_disconnect: START client_do_disconnect()
1741209459.788076 worker[1639023]: PVE::APIServer::AnyEvent +222: 
client_do_disconnect: close connection AnyEvent::Handle=HASH(0x603027214400)    
                                                                                
            
1741209459.788130 worker[1639023]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
$hdl->stoptls()
1741209459.788221 worker[1639023]: PVE::APIServer::AnyEvent +219: (eval): >>> 
finish_response().on_drain
1741209459.788248 worker[1639023]: PVE::APIServer::AnyEvent +222: 
client_do_disconnect: START client_do_disconnect()
1741209459.788493 worker[1639023]: PVE::APIServer::AnyEvent +222: 
client_do_disconnect: ERROR: detected empty handle
1741209459.788553 worker[1639023]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
shutdown($hdl->{fh}, 1)
1741209459.788593 worker[1639023]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
ok
1741209459.788649 worker[1639023]: PVE::APIServer::AnyEvent +222: 
client_do_disconnect: CLOSE FH10 CONN0

So we move these after the clearing of handlers. This was fixed by commit 
f737984 for some cases, perhaps for all cases by clearing handlers prior to 
shutdown?

> If I read the docs right, this could block (would that be an issue here?) and 
> could potentially destroy the handle (so that might need to be rechecked 
> afterwards to prevent spurious warnings?)
> 
> what happens if we initiate the teardown, and the client never acks it?

I don't see this blocking in my observations. The server continues with socket 
teardown immediately after sending TLS close notify. If no response is received 
from the client, it remains in FIN_WAIT1 until that timeout is reached. This 
was tested by making a request with keepalive then immediately blocking all 
outbound on the client prior to reaching the 5 second timeout:

1741209811.245870 worker[1640562]: PVE::APIServer::AnyEvent +1910: (eval): 
ACCEPT FH10 CONN1
1741209811.245963 worker[1640562]: PVE::APIServer::AnyEvent +1910: (eval): 
Setting TLS to autostart
::ffff:10.36.88.28 - root@pam!katello_test [05/03/2025:14:23:31 -0700] "GET 
/api2/json/cluster HTTP/1.1" 200 360
1741209811.260657 worker[1640562]: PVE::APIServer::AnyEvent +423: 
finish_response: KEEPALIVE 99
1741209816.262383 worker[1640562]: AnyEvent::Handle +655: __ANON__: ERROR 
accept_connections()
1741209816.262532 worker[1640562]: PVE::APIServer::AnyEvent +1962: (eval): >>> 
accept_connections().on_error
1741209816.262631 worker[1640562]: PVE::APIServer::AnyEvent +1969: 
client_do_disconnect: START client_do_disconnect()
1741209816.262706 worker[1640562]: PVE::APIServer::AnyEvent +1969: 
client_do_disconnect: close connection AnyEvent::Handle=HASH(0x5eede90cc068)    
                                                                                
           
1741209816.262750 worker[1640562]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
$hdl->stoptls()
1741209816.262924 worker[1640562]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
shutdown($hdl->{fh}, 1)
1741209816.262975 worker[1640562]: PVE::APIServer::AnyEvent +191: __ANON__: >>> 
ok
1741209816.262997 worker[1640562]: PVE::APIServer::AnyEvent +1969: 
client_do_disconnect: CLOSE FH10 CONN0

There is almost no time between stoptls() and shutdown(). We can see the client 
never responded by inspecting the connection on the server:

FIN-WAIT-1 0      25     [::ffff:10.36.88.27]:8006  [::ffff:10.36.88.28]:48382

As for the handle being destroyed, it gets deleted in client_do_disconnect() 
and I do not see it being used beyond there. I observed no warnings during 
testing, but it's certainly possible I missed something. If there are any 
additional tests I can do here, please let me know!

cheers,

Rob Rozestraten (1):
  fix unexpected EOF for client when closing TLS session

 src/PVE/APIServer/AnyEvent.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.48.1




--- End Message ---
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to