Tom Herbert <t...@herbertland.com> wrote: > On Tue, Nov 24, 2015 at 12:55 PM, Florian Westphal <f...@strlen.de> wrote: > > Why anyone would invest such a huge amount of work in making this > > kernel-based framing for single-stream tcp record (de)mux rather than > > improving the userspace protocol to use UDP or SCTP or at least > > one tcp connection per worker is beyond me. > > > From the /0 patch: > > Q: Why not use an existing message-oriented protocol such as RUDP, > DCCP, SCTP, RDS, and others? > > A: Because that would entail using a completely new transport protocol.
Thats why I wrote 'or at least one tcp connection per worker'. > > For TX side, why is writev not good enough? > > writev on a TCP stream does not guarantee atomicity of the operation. Are you talking about short writes? > It writes atomic without user space needing to implement locking when > a socket is shared amongst threads. Yes, I get that point, but I maintain that KCM is a strange workaround for bad userspace design. 1 tcp connection per thread -> no userspace sockfd lock needed Sender side can use writev, sendmsg, sendmmsg, etc to avoid sending sub-record sized frames. Is user space really so bad that instead of fixing it its simpler to work around it with even more kernel bloat? Since for KCM userspace has to be adjusted anyway I find that hard to believe. I don't know if the 'dynamic RCVLOWAT' that you want is needed (you say 'yes', Eric reply seems to indicate its not (at least assuming a sane/friendly peer that doesn't intentionally xmit byte-by-byte). But assuming there would really be a benefit, maybe a RCVLOWAT2 could be added? Of course we could only make it a hint and would have to make a blocking read return with less data than desired when tcp rmem limit gets hit. But at least we'd avoid the 'unbounded allocation of large amount of kernel memory' thing that we have with current proposal. Thanks, Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html