On Sun, 13 Aug 2017 18:58:58 +0200 Paweł Staszewski <pstaszew...@itcare.pl> 
wrote:

> To show some difference below comparision vlan/no-vlan traffic
> 
> 10Mpps forwarded traffic vith no-vlan vs 6.9Mpps with vlan

I'm trying to reproduce in my testlab (with ixgbe).  I do see, a
performance reduction of about 10-19% when I forward out a VLAN
interface.  This is larger than I expected, but still lower than what
you reported 30-40% slowdown.

[...]

> >>> perf top:
> >>>
> >>>    PerfTop:   77835 irqs/sec  kernel:99.7%  
> >>> ---------------------------------------------
> >>>
> >>>       16.32%  [kernel]       [k] skb_dst_force
> >>>       16.30%  [kernel]       [k] dst_release
> >>>       15.11%  [kernel]       [k] rt_cache_valid
> >>>       12.62%  [kernel]       [k] ipv4_mtu  
> >> It seems a little strange that these 4 functions are on the top  

I don't see these in my test.

> >>  
> >>>        5.60%  [kernel]       [k] do_raw_spin_lock  
> >> Why is calling/taking this lock? (Use perf call-graph recording).  
> > can be hard to paste it here:)
> > attached file

The attached was very big. Please don't attach so big file on mailing
lists.  Next time plase share them via e.g. pastebin. The output was a
capture from your terminal, which made the output more difficult to
read.  Hint: You can/could use perf --stdio and place it in a file
instead.

The output (extracted below) didn't show who called 'do_raw_spin_lock',
BUT it showed another interesting thing.  The kernel code
__dev_queue_xmit() in might create route dst-cache problem for itself(?),
as it will first call skb_dst_force() and then skb_dst_drop() when the
packet is transmitted on a VLAN.

 static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
 {
 [...]
        /* If device/qdisc don't need skb->dst, release it right now while
         * its hot in this cpu cache.
         */
        if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
                skb_dst_drop(skb);
        else
                skb_dst_force(skb);


- - 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Extracted part of attached perf output:

 --5.37%--ip_rcv_finish
   |          
   |--4.02%--ip_forward
   |   |          
   |    --3.92%--ip_forward_finish
   |       |          
   |        --3.91%--ip_output
   |          |          
   |           --3.90%--ip_finish_output
   |              |          
   |               --3.88%--ip_finish_output2
   |                  |          
   |                   --2.77%--neigh_connected_output
   |                     |          
   |                      --2.74%--dev_queue_xmit
   |                         |          
   |                          --2.73%--__dev_queue_xmit
   |                             |          
   |                             |--1.66%--dev_hard_start_xmit
   |                             |   |          
   |                             |    --1.64%--vlan_dev_hard_start_xmit
   |                             |       |          
   |                             |        --1.63%--dev_queue_xmit
   |                             |           |          
   |                             |            --1.62%--__dev_queue_xmit
   |                             |               |          
   |                             |               |--0.99%--skb_dst_drop.isra.77
   |                             |               |   |          
   |                             |               |   --0.99%--dst_release
   |                             |               |          
   |                             |                --0.55%--sch_direct_xmit
   |                             |          
   |                              --0.99%--skb_dst_force
   |          
    --1.29%--ip_route_input_noref
        |          
         --1.29%--ip_route_input_rcu
             |          
              --1.05%--rt_cache_valid

Reply via email to