Currently, the value returned in a kevent's data member by the
EVFILT_READ filter is "number of bytes in the socket buffer" which
includes control and out-of-band data.  However, this isn't particularly
useful as any read(), readv(), or readmsg() for the amount of data
reported may block if there is any non-protocol data in the buffer.  And
being that there is no way for userland applications to determine if, and
if so how much, non-protocol data is in the buffer, the reported value
cannot be trusted for anything useful.
  PR 30634 touches on this issue; UDP sockets are particularly visible
examples since they always include 16 bytes of address information in
addition to the datagram received.  However, from reading the code it
would appear that OOB data can cause a similar problem for TCP sockets.
  It seems that the overriding issue is that the read* API takes the
number of bytes of protocol data to read whereas kevent() reports the
total number of bytes available (protocol or administrative).  The
attached patch, which I would appreciate your comments on, modifies
kevent() to report just the number of bytes of protocol data.

  As an aside, it appears that the FIONREAD ioctl (sys_socket.c:soo_ioctl)
and stat(2) on a socket (sys_socket.c:soo_stat) also return the total
number of bytes (protocol data other otherwise) in the socket buffer.
For similar reasons as described above, I suspect that these should be
also modified to return just the number of bytes of actual data.  Unless
someone knows of an explicit example otherwise, I don't think changing the
value reported via these interfaces would break any existing applications
as they are probably expecting the new behaviour anyway.

  Thanks,

  Kelly

  (P.S. I've already sent a version of this patch, made against -stable,
   to Jonathan, but I haven't heard anything from him in almost a week)

--
Kelly Yancey --  kbyanc@{posi.net,FreeBSD.org}
Index: sys/socketvar.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/socketvar.h,v
retrieving revision 1.94
diff -u -p -r1.94 socketvar.h
--- sys/socketvar.h     17 Aug 2002 02:36:16 -0000      1.94
+++ sys/socketvar.h     16 Oct 2002 21:34:13 -0000
@@ -105,6 +105,7 @@ struct socket {
                u_int   sb_hiwat;       /* max actual char count */
                u_int   sb_mbcnt;       /* chars of mbufs used */
                u_int   sb_mbmax;       /* max chars of mbufs to use */
+               u_int   sb_ctl;         /* non-data chars in buffer */
                int     sb_lowat;       /* low water mark */
                int     sb_timeo;       /* timeout for read/write */
                short   sb_flags;       /* flags, see below */
@@ -227,6 +228,8 @@ struct xsocket {
 /* adjust counters in sb reflecting allocation of m */
 #define        sballoc(sb, m) { \
        (sb)->sb_cc += (m)->m_len; \
+       if ((m)->m_type != MT_DATA) \
+               (sb)->sb_ctl += (m)->m_len; \
        (sb)->sb_mbcnt += MSIZE; \
        if ((m)->m_flags & M_EXT) \
                (sb)->sb_mbcnt += (m)->m_ext.ext_size; \
@@ -235,6 +238,8 @@ struct xsocket {
 /* adjust counters in sb reflecting freeing of m */
 #define        sbfree(sb, m) { \
        (sb)->sb_cc -= (m)->m_len; \
+       if ((m)->m_type != MT_DATA) \
+               (sb)->sb_ctl -= (m)->m_len; \
        (sb)->sb_mbcnt -= MSIZE; \
        if ((m)->m_flags & M_EXT) \
                (sb)->sb_mbcnt -= (m)->m_ext.ext_size; \
Index: kern/uipc_socket.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.132
diff -u -p -r1.132 uipc_socket.c
--- kern/uipc_socket.c  5 Oct 2002 21:23:46 -0000       1.132
+++ kern/uipc_socket.c  16 Oct 2002 21:32:01 -0000
@@ -1785,6 +1785,7 @@ filt_soread(struct knote *kn, long hint)
        struct socket *so = (struct socket *)kn->kn_fp->f_data;
 
        kn->kn_data = so->so_rcv.sb_cc;
+       kn->kn_data -= so->so_rcv.sb_ctl;
        if (so->so_state & SS_CANTRCVMORE) {
                kn->kn_flags |= EV_EOF;
                kn->kn_fflags = so->so_error;

Reply via email to