Currently, the value returned in a kevent's data member by the EVFILT_READ filter is "number of bytes in the socket buffer" which includes control and out-of-band data. However, this isn't particularly useful as any read(), readv(), or readmsg() for the amount of data reported may block if there is any non-protocol data in the buffer. And being that there is no way for userland applications to determine if, and if so how much, non-protocol data is in the buffer, the reported value cannot be trusted for anything useful. PR 30634 touches on this issue; UDP sockets are particularly visible examples since they always include 16 bytes of address information in addition to the datagram received. However, from reading the code it would appear that OOB data can cause a similar problem for TCP sockets. It seems that the overriding issue is that the read* API takes the number of bytes of protocol data to read whereas kevent() reports the total number of bytes available (protocol or administrative). The attached patch, which I would appreciate your comments on, modifies kevent() to report just the number of bytes of protocol data.
As an aside, it appears that the FIONREAD ioctl (sys_socket.c:soo_ioctl) and stat(2) on a socket (sys_socket.c:soo_stat) also return the total number of bytes (protocol data other otherwise) in the socket buffer. For similar reasons as described above, I suspect that these should be also modified to return just the number of bytes of actual data. Unless someone knows of an explicit example otherwise, I don't think changing the value reported via these interfaces would break any existing applications as they are probably expecting the new behaviour anyway. Thanks, Kelly (P.S. I've already sent a version of this patch, made against -stable, to Jonathan, but I haven't heard anything from him in almost a week) -- Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
Index: sys/socketvar.h =================================================================== RCS file: /home/ncvs/src/sys/sys/socketvar.h,v retrieving revision 1.94 diff -u -p -r1.94 socketvar.h --- sys/socketvar.h 17 Aug 2002 02:36:16 -0000 1.94 +++ sys/socketvar.h 16 Oct 2002 21:34:13 -0000 @@ -105,6 +105,7 @@ struct socket { u_int sb_hiwat; /* max actual char count */ u_int sb_mbcnt; /* chars of mbufs used */ u_int sb_mbmax; /* max chars of mbufs to use */ + u_int sb_ctl; /* non-data chars in buffer */ int sb_lowat; /* low water mark */ int sb_timeo; /* timeout for read/write */ short sb_flags; /* flags, see below */ @@ -227,6 +228,8 @@ struct xsocket { /* adjust counters in sb reflecting allocation of m */ #define sballoc(sb, m) { \ (sb)->sb_cc += (m)->m_len; \ + if ((m)->m_type != MT_DATA) \ + (sb)->sb_ctl += (m)->m_len; \ (sb)->sb_mbcnt += MSIZE; \ if ((m)->m_flags & M_EXT) \ (sb)->sb_mbcnt += (m)->m_ext.ext_size; \ @@ -235,6 +238,8 @@ struct xsocket { /* adjust counters in sb reflecting freeing of m */ #define sbfree(sb, m) { \ (sb)->sb_cc -= (m)->m_len; \ + if ((m)->m_type != MT_DATA) \ + (sb)->sb_ctl -= (m)->m_len; \ (sb)->sb_mbcnt -= MSIZE; \ if ((m)->m_flags & M_EXT) \ (sb)->sb_mbcnt -= (m)->m_ext.ext_size; \ Index: kern/uipc_socket.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.132 diff -u -p -r1.132 uipc_socket.c --- kern/uipc_socket.c 5 Oct 2002 21:23:46 -0000 1.132 +++ kern/uipc_socket.c 16 Oct 2002 21:32:01 -0000 @@ -1785,6 +1785,7 @@ filt_soread(struct knote *kn, long hint) struct socket *so = (struct socket *)kn->kn_fp->f_data; kn->kn_data = so->so_rcv.sb_cc; + kn->kn_data -= so->so_rcv.sb_ctl; if (so->so_state & SS_CANTRCVMORE) { kn->kn_flags |= EV_EOF; kn->kn_fflags = so->so_error;