On 27/03/14 18:25, Stefan Hajnoczi wrote: > On Thu, Mar 27, 2014 at 4:53 PM, Anton Ivanov (antivano) > <antiv...@cisco.com> wrote: >> On 27/03/14 12:30, Stefan Hajnoczi wrote: >>> On Wed, Mar 26, 2014 at 11:08:41AM +0000, anton.iva...@kot-begemot.co.uk >>> wrote: >>>> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, >>>> + const struct iovec *iov, >>>> + int iovcnt) >>>> +{ >>>> + NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc); >>>> + >>>> + struct msghdr message; >>>> + int ret; >>>> + >>>> + if (iovcnt > MAX_L2TPV3_IOVCNT - 1) { >>>> + error_report( >>>> + "iovec too long %d > %d, change l2tpv3.h", >>>> + iovcnt, MAX_L2TPV3_IOVCNT >>>> + ); >>>> + return -1; >>>> + } >>> The alternative is to linearize the buffer using iov_to_buf() and call >>> net_l2tpv3_receive_dgram(). Something like: >>> >>> if (iovcnt > MAX_L2TPV3_IOVCNT - 1) { >>> size_t size = iov_size(iov, iovcnt); >>> uint8_t *buf; >>> >>> buf = g_malloc(size); >>> iov_to_buf(iov, iovcnt, 0, buf, size); >>> ret = net_l2tpv3_receive_dgram(nc, buf, size); >>> g_free(buf); >>> return ret; >>> } >> iov_to_buf is a memcpy of the data and a malloc down the call chain. >> That is quite a significant cost compared to just shifting the iov a bit >> to add an element for header. I tried that early on and it was >> introducing a noticeable penalty. >> >> If I am to merge it I would actually merge it the other way around - >> express both net_l2tpv3_receive_dgram and net_l2tpv3_receive_dgram_iov >> via an iov based backend. That will allow to add sendmmsg one day as the >> data structures will be the same in all cases. > I agree with you that the iov code path should be used. The context > for my comment was "if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {". I'm just > suggesting that we fall back to linearizing the iovector if it has too > many elements, instead of erroring out.
Understood. Will do. > >>>> +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf) >>>> +{ >>>> + >>>> + uint32_t *session; >>>> + uint64_t cookie; >>>> + >>>> + if ((!s->udp) && (!s->ipv6)) { >>>> + buf += sizeof(struct iphdr) /* fix for ipv4 raw */; >>>> + } >>>> + >>>> + /* we do not do a strict check for "data" packets as per >>>> + * the RFC spec because the pure IP spec does not have >>>> + * that anyway. >>>> + */ >>>> + >>>> + if (s->cookie) { >>>> + if (s->cookie_is_64) { >>>> + cookie = ldq_be_p(buf + s->cookie_offset); >>>> + } else { >>>> + cookie = ldl_be_p(buf + s->cookie_offset); >>>> + } >>>> + if (cookie != s->rx_cookie) { >>>> + if (!s->header_mismatch) { >>>> + error_report("unknown cookie id\n"); >>> error_report() messages should not have a newline ('\n') at the end. >>> >>>> + } >>>> + return -1; >>>> + } >>>> + } >>>> + session = (uint32_t *) (buf + s->session_offset); >>>> + if (ldl_be_p(session) != s->rx_session) { >>>> + if (!s->header_mismatch) { >>>> + error_report("session mismatch"); >>>> + } >>>> + return -1; >>>> + } >>>> + return 0; >>>> +} >>>> + >>>> +static void net_l2tpv3_process_queue(NetL2TPV3State *s) >>>> +{ >>>> + int size = 0; >>>> + struct iovec *vec; >>>> + bool bad_read; >>>> + int data_size; >>>> + struct mmsghdr *msgvec; >>>> + >>>> + /* go into ring mode only if there is a "pending" tail */ >>>> + if (s->queue_depth > 0 && (!s->delivering)) { >>>> + do { >>>> + msgvec = s->msgvec + s->queue_tail; >>>> + if (msgvec->msg_len > 0) { >>>> + data_size = msgvec->msg_len - s->header_size; >>> header_size is never assigned to. >> It is - in init. It is used to divorce the "read header size" from >> "l2tpv3 header size" as needed by the ipv4 socket API. Using it allows >> to remove the check if we are reading from ipv4 raw sockets in most (but >> not all) places. > I'm not seeing that in this version of the patch, please check again > and let me know which line number it is on. You are correct. I failed to cherry-pick that change correctly from my development branch when creating the patchset for submission. Mea culpa. > >>>> + vec = msgvec->msg_hdr.msg_iov; >>>> + if ((data_size > 0) && >>>> + (l2tpv3_verify_header(s, vec->iov_base) == 0)) { >>>> + vec++; >>>> + /* we do not use the "normal" send and send async >>>> + * functions here because we have our own buffer - >>>> + * the message vector. If we use the "usual" ones >>>> + * we will end up double-buffering. >>>> + */ >>>> + s->delivering = true; >>> What is s->delivering guarding against? This function is only called >>> from the s->fd read handler function, so I don't see a reentrant code >>> path. >> Cut-n-paste from the queue functions without doing full analysis of what >> is it guarding against there. > Okay, the problem in the generic net code is that some net clients > (like slirp) will send packets back to their peer from their receive() > function. For example, imagine a DHCP request packet from the guest > being sent to the slirp net client. It will send the DHCP reply right > away while we're still in the send and receive functions. This > creates reentrancy and not all code handles it so we just queue the > packets instead. > > But in this case there is no reentrancy since only the fd handler > function can call net_l2tpv3_process_queue(). OK. > >>>> + /* deliver directly to peer bypassing queueing and >>>> + * buffering >>>> + */ >>>> + size = qemu_deliver_packet( >>>> + &s->nc, >>>> + QEMU_NET_PACKET_FLAG_NONE, >>>> + vec->iov_base, >>>> + data_size, >>>> + s->nc.peer >>>> + ); >>> There is no callback when the peer re-enables receive. In other words, >>> packets will be stuck in s->msgvec until a new packet arrives and >>> net_l2tpv3_send() is invoked. >>> >>> I think you need to modify qemu_flush_queued_packets() to invoke a new >>> callback. You could convert the existing code: >> I had the same thoughts, just did not want to touch it just yet. >> >> Something along the lines would have been necessary for sendmmsg too as >> there you would want to work off a pre-configured queue in the first place. >> >>> if (nc->peer && nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_HUBPORT) { >>> if (net_hub_flush(nc->peer)) { >>> qemu_notify_event(); >>> } >>> } >>> >>> to: >>> >>> if (nc->peer && nc->peer->info->peer_flushed) { >>> if (nc->peer->info->peer_flushed(nc->peer)) { >>> qemu_notify_event(); >>> } >>> } >>> >>> and move the hub flushing code into net/hub.c. >>> >>> Then you could also implement ->peer_flushed() in l2tpv3.c so that it >>> calls net_l2tpv3_process_queue() and returns true if packets were >>> delivered. >>> >>> Please introduce ->peer_flushed() in a separate patch. >> Understood. >> >>> Or you could keep it simple and go back to using qemu_send_packet() >>> without trying to avoid duplicate buffering. (You can always send >>> patches later that eliminate the buffering.) >> I would probably go back to that for submission so we can do it in stages. >> >> I will branch out the zero copy work and relevant additions to queue.c >> for submission at a later date along with zero copy versions of gre, >> raw, as well as a revised version of l2tpv3.c > Excellent, I was hoping for this, thanks! OK. Will do it this way. ETA - Monday. A.