Re: [Qemu-devel] [PATCH v6] net: L2TPv3 transport

Anton Ivanov Thu, 27 Mar 2014 12:48:13 -0700

On 27/03/14 18:25, Stefan Hajnoczi wrote:
> On Thu, Mar 27, 2014 at 4:53 PM, Anton Ivanov (antivano)
> <antiv...@cisco.com> wrote:
>> On 27/03/14 12:30, Stefan Hajnoczi wrote:
>>> On Wed, Mar 26, 2014 at 11:08:41AM +0000, anton.iva...@kot-begemot.co.uk 
>>> wrote:
>>>> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc,
>>>> +                    const struct iovec *iov,
>>>> +                    int iovcnt)
>>>> +{
>>>> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
>>>> +
>>>> +    struct msghdr message;
>>>> +    int ret;
>>>> +
>>>> +    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
>>>> +        error_report(
>>>> +            "iovec too long %d > %d, change l2tpv3.h",
>>>> +            iovcnt, MAX_L2TPV3_IOVCNT
>>>> +        );
>>>> +        return -1;
>>>> +    }
>>> The alternative is to linearize the buffer using iov_to_buf() and call
>>> net_l2tpv3_receive_dgram().  Something like:
>>>
>>> if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
>>>     size_t size = iov_size(iov, iovcnt);
>>>     uint8_t *buf;
>>>
>>>     buf = g_malloc(size);
>>>     iov_to_buf(iov, iovcnt, 0, buf, size);
>>>     ret = net_l2tpv3_receive_dgram(nc, buf, size);
>>>     g_free(buf);
>>>     return ret;
>>> }
>> iov_to_buf is a memcpy of the data and a malloc down the call chain.
>> That is quite a significant cost compared to just shifting the iov a bit
>> to add an element for header. I tried that early on and it was
>> introducing a noticeable penalty.
>>
>> If I am to merge it I would actually merge it the other way around -
>> express both net_l2tpv3_receive_dgram and net_l2tpv3_receive_dgram_iov
>> via an iov based backend. That will allow to add sendmmsg one day as the
>> data structures will be the same in all cases.
> I agree with you that the iov code path should be used.  The context
> for my comment was "if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {".  I'm just
> suggesting that we fall back to linearizing the iovector if it has too
> many elements, instead of erroring out.


Understood. Will do.

>
>>>> +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf)
>>>> +{
>>>> +
>>>> +    uint32_t *session;
>>>> +    uint64_t cookie;
>>>> +
>>>> +    if ((!s->udp) && (!s->ipv6)) {
>>>> +        buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
>>>> +    }
>>>> +
>>>> +    /* we do not do a strict check for "data" packets as per
>>>> +    * the RFC spec because the pure IP spec does not have
>>>> +    * that anyway.
>>>> +    */
>>>> +
>>>> +    if (s->cookie) {
>>>> +        if (s->cookie_is_64) {
>>>> +            cookie = ldq_be_p(buf + s->cookie_offset);
>>>> +        } else {
>>>> +            cookie = ldl_be_p(buf + s->cookie_offset);
>>>> +        }
>>>> +        if (cookie != s->rx_cookie) {
>>>> +            if (!s->header_mismatch) {
>>>> +                error_report("unknown cookie id\n");
>>> error_report() messages should not have a newline ('\n') at the end.
>>>
>>>> +            }
>>>> +            return -1;
>>>> +        }
>>>> +    }
>>>> +    session = (uint32_t *) (buf + s->session_offset);
>>>> +    if (ldl_be_p(session) != s->rx_session) {
>>>> +        if (!s->header_mismatch) {
>>>> +            error_report("session mismatch");
>>>> +        }
>>>> +        return -1;
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void net_l2tpv3_process_queue(NetL2TPV3State *s)
>>>> +{
>>>> +    int size = 0;
>>>> +    struct iovec *vec;
>>>> +    bool bad_read;
>>>> +    int data_size;
>>>> +    struct mmsghdr *msgvec;
>>>> +
>>>> +    /* go into ring mode only if there is a "pending" tail */
>>>> +    if (s->queue_depth > 0 && (!s->delivering)) {
>>>> +        do {
>>>> +            msgvec = s->msgvec + s->queue_tail;
>>>> +            if (msgvec->msg_len > 0) {
>>>> +                data_size = msgvec->msg_len - s->header_size;
>>> header_size is never assigned to.
>> It is - in init. It is used to divorce the "read header size" from
>> "l2tpv3 header size" as needed by the ipv4 socket API. Using it allows
>> to remove the check if we are reading from ipv4 raw sockets in most (but
>> not all) places.
> I'm not seeing that in this version of the patch, please check again
> and let me know which line number it is on.

You are correct.

I failed to cherry-pick that change correctly from my development branch
when creating the patchset for submission.

Mea culpa.

>
>>>> +                vec = msgvec->msg_hdr.msg_iov;
>>>> +                if ((data_size > 0) &&
>>>> +                    (l2tpv3_verify_header(s, vec->iov_base) == 0)) {
>>>> +                    vec++;
>>>> +                    /* we do not use the "normal" send and send async
>>>> +                     * functions here because we have our own buffer -
>>>> +                     * the message vector. If we use the "usual" ones
>>>> +                     * we will end up double-buffering.
>>>> +                     */
>>>> +                    s->delivering = true;
>>> What is s->delivering guarding against?  This function is only called
>>> from the s->fd read handler function, so I don't see a reentrant code
>>> path.
>> Cut-n-paste from the queue functions without doing full analysis of what
>> is it guarding against there.
> Okay, the problem in the generic net code is that some net clients
> (like slirp) will send packets back to their peer from their receive()
> function.  For example, imagine a DHCP request packet from the guest
> being sent to the slirp net client.  It will send the DHCP reply right
> away while we're still in the send and receive functions.  This
> creates reentrancy and not all code handles it so we just queue the
> packets instead.
>
> But in this case there is no reentrancy since only the fd handler
> function can call net_l2tpv3_process_queue().

OK.

>
>>>> +                    /* deliver directly to peer bypassing queueing and
>>>> +                     * buffering
>>>> +                     */
>>>> +                    size = qemu_deliver_packet(
>>>> +                            &s->nc,
>>>> +                            QEMU_NET_PACKET_FLAG_NONE,
>>>> +                            vec->iov_base,
>>>> +                            data_size,
>>>> +                            s->nc.peer
>>>> +                        );
>>> There is no callback when the peer re-enables receive.  In other words,
>>> packets will be stuck in s->msgvec until a new packet arrives and
>>> net_l2tpv3_send() is invoked.
>>>
>>> I think you need to modify qemu_flush_queued_packets() to invoke a new
>>> callback.  You could convert the existing code:
>> I had the same thoughts, just did not want to touch it just yet.
>>
>> Something along the lines would have been necessary for sendmmsg too as
>> there you would want to work off a pre-configured queue in the first place.
>>
>>> if (nc->peer && nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_HUBPORT) {
>>>     if (net_hub_flush(nc->peer)) {
>>>         qemu_notify_event();
>>>     }
>>> }
>>>
>>> to:
>>>
>>> if (nc->peer && nc->peer->info->peer_flushed) {
>>>     if (nc->peer->info->peer_flushed(nc->peer)) {
>>>         qemu_notify_event();
>>>     }
>>> }
>>>
>>> and move the hub flushing code into net/hub.c.
>>>
>>> Then you could also implement ->peer_flushed() in l2tpv3.c so that it
>>> calls net_l2tpv3_process_queue() and returns true if packets were
>>> delivered.
>>>
>>> Please introduce ->peer_flushed() in a separate patch.
>> Understood.
>>
>>> Or you could keep it simple and go back to using qemu_send_packet()
>>> without trying to avoid duplicate buffering.  (You can always send
>>> patches later that eliminate the buffering.)
>> I would probably go back to that for submission so we can do it in stages.
>>
>> I will branch out the zero copy work and relevant additions to queue.c
>> for submission at a later date along with zero copy versions of gre,
>> raw, as well as a revised version of l2tpv3.c
> Excellent, I was hoping for this, thanks!

OK. Will do it this way.

ETA - Monday.


A.

Re: [Qemu-devel] [PATCH v6] net: L2TPv3 transport

Reply via email to