On Tue, 2016-12-06 at 10:53 +0100, Paolo Abeni wrote:
> On Mon, 2016-12-05 at 09:57 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <eduma...@google.com>
> > 
> > In UDP recvmsg() path we currently access 3 cache lines from an skb
> > while holding receive queue lock, plus another one if packet is
> > dequeued, since we need to change skb->next->prev
> > 
> > 1st cache line (contains ->next/prev pointers, offsets 0x00 and 0x08)
> > 2nd cache line (skb->len & skb->peeked, offsets 0x80 and 0x8e)
> > 3rd cache line (skb->truesize/users, offsets 0xe0 and 0xe4)
> > 
> > skb->peeked is only needed to make sure 0-length packets are properly
> > handled while MSG_PEEK is operated.
> > 
> > I had first the intent to remove skb->peeked but the "MSG_PEEK at
> > non-zero offset" support added by Sam Kumar makes this not possible.
> 
> I'm wondering if peeking with offset is going to complicate the 2 queues
> patch, too.
> 
> > This patch avoids one cache line miss during the locked section, when
> > skb->len and skb->peeked do not have to be read.
> > 
> > It also avoids the skb_set_peeked() cost for non empty UDP datagrams.
> > 
> > Signed-off-by: Eric Dumazet <eduma...@google.com>
> > ---
> >  net/core/datagram.c |   19 ++++++++++---------
> >  1 file changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > index 
> > 49816af8586bb832e806972b486588041a99524c..9482037a5c8c64aec79e42c65bd2691bdd9450a3
> >  100644
> > --- a/net/core/datagram.c
> > +++ b/net/core/datagram.c
> > @@ -214,6 +214,7 @@ struct sk_buff *__skb_try_recv_datagram(struct sock 
> > *sk, unsigned int flags,
> >     if (error)
> >             goto no_packet;
> >  
> > +   *peeked = 0;
> >     do {
> >             /* Again only user level code calls this function, so nothing
> >              * interrupt level will suddenly eat the receive_queue.
> > @@ -227,22 +228,22 @@ struct sk_buff *__skb_try_recv_datagram(struct sock 
> > *sk, unsigned int flags,
> >             spin_lock_irqsave(&queue->lock, cpu_flags);
> >             skb_queue_walk(queue, skb) {
> >                     *last = skb;
> > -                   *peeked = skb->peeked;
> >                     if (flags & MSG_PEEK) {
> >                             if (_off >= skb->len && (skb->len || _off ||
> >                                                      skb->peeked)) {
> >                                     _off -= skb->len;
> >                                     continue;
> >                             }
> > -
> > -                           skb = skb_set_peeked(skb);
> > -                           error = PTR_ERR(skb);
> > -                           if (IS_ERR(skb)) {
> > -                                   spin_unlock_irqrestore(&queue->lock,
> > -                                                          cpu_flags);
> > -                                   goto no_packet;
> > +                           if (!skb->len) {
> > +                                   skb = skb_set_peeked(skb);
> > +                                   if (IS_ERR(skb)) {
> > +                                           error = PTR_ERR(skb);
> > +                                           
> > spin_unlock_irqrestore(&queue->lock,
> > +                                                                  
> > cpu_flags);
> > +                                           goto no_packet;
> > +                                   }
> >                             }
> 
> I don't understand why we can avoid setting skb->peek if len > 0. I
> think that will change the kernel behavior if:
> - peek with offset is set
> - 3 skbs with len > 0 are enqueued
> - the u/s peek (with offset) the second one
> - the u/s disable peeking with offset and peeks 2 more skbs.
> 
> With the current code in the last step the u/s is going to peek the 1#
> and the 3# skbs, after this patch will peek the 1# and the 2#. Am I
> missing something ? Probably the new behavior is more correct, but still
> is a change. 

Please ignore the above dumb comment. I misread the 'skip condition'.

I'm fine with the patch in its current form.

Acked-by: Paolo Abeni <pab...@redhat.com>

Reply via email to