On (01/17/18 18:50), Willem de Bruijn wrote: > > This can cause reordering with parallel readers. Can we avoid the need > for peeking? It also caused a slew of subtle bugs previously.
Yes, I did notice the potential for re-ordering when writing the patch.. but these are not actuallly messages from the wire, so is re-ordering fatal? In general, I"m not particularly attached to this solution- in my testing, I'm seeing that it's possible to reduce the latency and still take a hit on the throughput if the application does not reap the completion notifciation (and send out new data) efficiently Some (radically differnt) alternatives that were suggested to me - send up all the cookies as ancillary data with recvmsg (i.e., send it as a cmsgdata along with actual data from the wire). In most cases, the application has data to read, anyway. If it doesnt (pure sender), we could wake up recvmsg with 0 bytes of data, but with the cookie info in the ancillary data. This feels not-so-elegant to me, but I suppose it would have the benefit of optimizing on the syscall overhead.. (and you could use MSG_CTRUNC to handle the case of insuufficient bufffer for cookies, sending the rest on the next call).. - allow application to use a setsockopt on the rds socket, with some shmem region, into which the kernel could write the cookies, Let application reap cookies without syscall overhead from that shmem region.. > How about just define a max number of cookies and require the caller > to always read with sufficient room to hold them? This may be "good enough" as well, maybe allow a max of (say) 16 cookies, and set up the skb's in the error queue to send up batches of 16 cookies at a time? --Sowmini