Hi, When testing HAST synchronization running both primary and secondary HAST instances on the same host I faced an issue that the synchronization may be very slow:
Apr 9 14:04:04 kopusha hastd[3812]: [test] (primary) Synchronization complete. 512MB synchronized in 16m38s (525KB/sec). hastd is synchronizing data in MAXPHYS (131072 bytes) blocks. Sending it splits them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while receives the whole block calling recv() with MSG_WAITALL option. Sometimes recv() gets stuck: in tcpdump I see that sending side sent all chunks, all they were acked, but receiving thread is still waiting in recv(). netstat is reporting non empty Recv-Q for receiving side (with the amount of bytes usually equal to the size of last sent chunk). It looked like the receiving userspace was not informed by the kernel that all data had been arrived. I can reproduce the issue with the attached test_MSG_WAITALL.c. I think the issue is in soreceive_generic(). If MSG_WAITALL is set but the request is larger than the receive buffer, it has to do the receive in sections. So after receiving some data it notifies protocol (calls pr_usrreqs->pru_rcvd) about the data, releasing so_rcv lock. Returning it blocks in sbwait() waiting for the rest of data. I think there is a race: when it was in pr_usrreqs->pru_rcvd not keeping the lock the rest of data could arrive. Thus it should check for this before sbwait(). See the attached uipc_socket.c.soreceive.patch. The patch fixes the issue for me. Apr 9 14:16:40 kopusha hastd[2926]: [test] (primary) Synchronization complete. 512MB synchronized in 4s (128MB/sec). I observed the problem on STABLE but believe the same is on CURRENT. BTW, I also tried optimized version of soreceive(), soreceive_stream(). It does not have this problem. But with it I was observing tcp connections getting stuck in soreceive_stream() on firefox (with many tabs) or pidgin (with many contacts) start. The processes were killable only with -9. I did not investigate this much though. -- Mikolaj Golub
test_MSG_WAITALL.c
Description: Binary data
Index: sys/kern/uipc_socket.c =================================================================== --- sys/kern/uipc_socket.c (revision 220472) +++ sys/kern/uipc_socket.c (working copy) @@ -1836,28 +1836,34 @@ dontblock: /* * Notify the protocol that some data has been * drained before blocking. */ if (pr->pr_flags & PR_WANTRCVD) { SOCKBUF_UNLOCK(&so->so_rcv); VNET_SO_ASSERT(so); (*pr->pr_usrreqs->pru_rcvd)(so, flags); SOCKBUF_LOCK(&so->so_rcv); } SBLASTRECORDCHK(&so->so_rcv); SBLASTMBUFCHK(&so->so_rcv); - error = sbwait(&so->so_rcv); - if (error) { - SOCKBUF_UNLOCK(&so->so_rcv); - goto release; + /* + * We could receive some data while was notifying the + * the protocol. Skip blocking in this case. + */ + if (so->so_rcv.sb_mb == NULL) { + error = sbwait(&so->so_rcv); + if (error) { + SOCKBUF_UNLOCK(&so->so_rcv); + goto release; + } } m = so->so_rcv.sb_mb; if (m != NULL) nextrecord = m->m_nextpkt; } } SOCKBUF_LOCK_ASSERT(&so->so_rcv); if (m != NULL && pr->pr_flags & PR_ATOMIC) { flags |= MSG_TRUNC; if ((flags & MSG_PEEK) == 0) (void) sbdroprecord_locked(&so->so_rcv);
_______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"