On Tue, Jan 28, 2025 at 09:14:00PM -0800, Gleb Smirnoff wrote: T> Second, with the patch the M_RPC leak count for me is 2. And I found that these T> two items are basically is a clnt_vc that belongs to a closed connection: T> T> fffff80029614a80 tcp4 0 0 10.6.6.9.772 10.6.6.9.2049 CLOSED T> T> There is no connection peer connection, as the server received a timeout trying T> to send. But rpc.tlsclntd doesn't try to send anything on the socket, it just T> keeps it select(2) fd set and doesn't garbage collect. T> T> So it is a bigger resource leak than just two pieces of M_RPC. I don't think T> this is related to my changes.
Here is what is going on here: - TCP connection is teared down and tcp_close() calls soisdisconnected() - soisdisconnected() calls clnt_vc_soupcall() to notify of error condition - clnt_vc_soupcall() tries soreceive() and gets so->so_error. - clnt_vc_soupcall() sets the client to error state. It doesn't wakeup anything cause there were no running RPC requests. It can't report back to clnt_rc that connection is dead. It doesn't mark itself for the clnt_vc_dotlsupcall() processing. So we end up with: (kgdb) p $tp->t_state $25 = 0 /* TCPS_CLOSED */ (kgdb) p/x $tp->t_inpcb.inp_flags & 0x04000000 /* INP_DROPPED */ $27 = 0x4000000 (kgdb) p/x $tp->t_inpcb.inp_socket->so_state $28 = 0x2000 /* SS_ISDISCONNECTED */ (kgdb) p/x $tp->t_inpcb.inp_socket->so_count $35 = 0x2 (kgdb) p/x $ct->ct_rcvstate $29 = 0x41 /* RPCRCVSTATE_UPCALLTHREAD | RPCRCVSTATE_NORMAL */ (kgdb) p $ct->ct_error $30 = {re_status = RPC_CANTRECV, ru = {RE_errno = 13, RE_why = RPCSEC_GSS_CREDPROBLEM, RE_vers = {low = 13, high = 0}, RE_lb = {s1 = 13, s2 = 0}}} (kgdb) p $ct->ct_pending $31 = {tqh_first = 0x0, tqh_last = 0xfffff80002838ea8} Note: In my case so->so_error was EACCESS, cause I used ipfw(4) rule to tear down connection, for normal TCP timeout should be ETIMEDOUT or ECONNRESET if remote has reset. That's why $ct->ct_error.ru.RE_errno == 13. So we need some mechanism for clnt_vc_soupcall() to report to upper clnt_rc that we are dead and ready to be garbage collected via CLNT_CLOSE() and then CLNT_RELEASE(). Once clnt_vc_destroy() is called the daemon will be notified that the TLS socket can be closed by the daemon, bringing so_count to 1 and then final sorele() will bring it to 0 and free. -- Gleb Smirnoff