On Fri, Nov 20, 2015 at 2:50 PM, Sowmini Varadhan <sowmini.varad...@oracle.com> wrote: > On (11/20/15 13:21), Tom Herbert wrote: >> +static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) > : >> + >> + if (msg->msg_flags & MSG_BATCH) { >> + kcm->tx_wait_more = true; >> + } else if (kcm->tx_wait_more || not_busy) { >> + err = kcm_write_msgs(kcm); >> + if (err < 0) { >> + /* We got a hard error in write_msgs but have >> + * already queued this message. Report an error >> + * in the socket, but don't affect return value >> + * from sendmsg >> + */ >> + pr_warn("KCM: Hard failure on >> kcm_write_msgs\n"); >> + report_csk_error(&kcm->sk, -err); >> + } >> + } > > It's interesting that kcm copies the user data to a skb and > then invokes kernel_sendpage on the frag_list in that skb- was this > specifically done with some perf goals in mind? If yes, do you happen > to have some estimate of how much this approach buys you, as opposed > to just setting up a sglist and calling tcp_sendpage later? (RDS uses > the latter approach, and I've tried to use the changes introduced > by Eric's commit in 5640f76, it helps slightly but I think there may > be other bottlenecks to overcome first for the specific req-resp > patterns that are common in DB workloads) > Hi Sowmini,
I did notice that RDS is just creating sglist, but I also noticed that this requires allocating "struct rds_message" which holds pointers to the sglist, list pointers for a queue, etc. This looks to me like its emulating skbuffs anyway. I haven't looked if there's performance issues otherwise in using the fraglist. It might be interesting if there was an interface to send skbufs on a kernel socket. > The other question I had when reading this code is: what if the > application never sends that last MSG_BATCH-less message, e.g., > it lies about how its going send more messages? will something eventually > time-out and send the data? Any estimates for a good batch size? > No time out. Sending will block. I don't think this behavior needs to be any different than what happens if an application forgets to complete a MSG_MORE. Thanks, Tom > --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html