2013/8/9 Florian Fainelli <f.faine...@gmail.com>:
> I am looking at bgmac_dma_rx_read() and I do not quite understand why
> you would need to copy data to the newly allocated SKB as it might
> really be killing performance here. Looking at b44, the code path
> doing this is just when the packet is smaller (say less than 256
> bytes) because in that case, the cost of a data cache invalidate might
> be higher than a fresh allocation plus memcpy(). Rather, the logic I
> would use is the following:
>
> - consume a packet from the DMA RX ring at a given index
> - dma_sync_single_for_cpu() this packet
> - call netif_receive_skb() for this packet
> - allocate a new SKB for the same RX ring index
>
> Eventually if you realize that for small packets you had better do a
> new allocation plus memcpy() (aka: copybreak) you could try that.

I've implemented that solution, but it didn't really help much :(

Tx try:

# readprofile -r; iperf -t 60 -c 192.168.1.218; readprofile | sort -nr
------------------------------------------------------------
Client connecting to 192.168.1.218, TCP port 5001
TCP window size: 20.5 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.1 port 38653 connected with 192.168.1.218 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-60.0 sec   614 MBytes  85.8 Mbits/sec
 11814 total                                      0.0046
  3179 *unknown*
  1119 __do_softirq                               2.5665
  1037 __copy_user_common                         1.4899
   550 csum_partial                               0.3852
   388 tcp_transmit_skb                           0.1537
   336 tcp_sendmsg                                0.0844
   271 dev_hard_start_xmit                        0.1656
   247 nf_hook_slow                               0.6938
   237 r4k_dma_cache_wback_inv                    1.0972
   222 bgmac_poll                                 0.1888
   183 kmem_cache_alloc                           0.6267
   174 bgmac_start_xmit                           0.1570
   172 __kmalloc                                  0.4886
   163 tcp_v4_rcv                                 0.0619
   162 tcp_write_xmit                             0.0568
   161 dev_queue_xmit                             0.1227

__copy_user_common is still appearing, and I've still no idea abut
that *unknown*

Also no real improvement for Rx:

# readprofile -r; iperf -s; readprofile | sort -nr
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.1.1 port 5001 connected with 192.168.1.218 port 56297
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-60.0 sec  1.25 GBytes   178 Mbits/sec
^C 11302 total                                      0.0044
  4132 *unknown*
  1860 csum_partial                               1.3025
  1720 __copy_user_common                         2.4713
   586 tcp_v4_rcv                                 0.2226
   480 r4k_dma_cache_inv                          2.0339
   479 ip_rcv                                     0.5162
   442 cpu_idle                                   4.6042
   394 nf_hook_slow                               1.1067
   376 skb_copy_ubufs                             0.7705
   329 __netif_receive_skb                        0.1685
   266 ip_local_deliver_finish                    0.4890
   252 tcp_rcv_established                        0.1620
   234 __bzero                                    0.6573
   222 bgmac_poll                                 0.1888
   172 process_backlog                            0.3739
   164 mips_dma_map_page                          0.9318

I'll post my patches soon, so you can verify my changes.

-- 
Rafał
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel

Reply via email to