On 11/08/13 17:41, Rafał Miłecki wrote:
2013/8/11 Robert Bradley <robert.bradl...@gmail.com>:
On 11/08/13 16:08, Rafał Miłecki wrote:
2013/8/9 Florian Fainelli <f.faine...@gmail.com>:
I am looking at bgmac_dma_rx_read() and I do not quite understand why
you would need to copy data to the newly allocated SKB as it might
really be killing performance here. Looking at b44, the code path
doing this is just when the packet is smaller (say less than 256
bytes) because in that case, the cost of a data cache invalidate might
be higher than a fresh allocation plus memcpy(). Rather, the logic I
would use is the following:
- consume a packet from the DMA RX ring at a given index
- dma_sync_single_for_cpu() this packet
- call netif_receive_skb() for this packet
- allocate a new SKB for the same RX ring index
Eventually if you realize that for small packets you had better do a
new allocation plus memcpy() (aka: copybreak) you could try that.
I've implemented that solution, but it didn't really help much :(
Well, http://patchwork.ozlabs.org/patch/220961/ seems to suggest that bgmac
can produce unaligned accesses, so I assume the memcpy() is used to avoid
that. You could try removing the new allocation and memcpy(), add in the IP
stack unaligned access patches from ar71xx and see if that helps...
That patch from Hauke was applied and I'm testing kernels having it.
Is there any extra unaligned access you're aware of, or you suspect?
I'm no expert when it comes to bgmac, but I expect that no unaligned
access currently exists. The only issue with that is that the unaligned
access is then traded for the expense of copying the packet (which
occurred even before the patch; the patch merely fixes the alignment of
the new SKB's data). In the past, a similar trick was done within the
ag71xx Ethernet drivers for ar71xx, but was reverted later since large
packets are/were costly to realign.
https://dev.openwrt.org/changeset/20506
https://dev.openwrt.org/changeset/20892
https://dev.openwrt.org/changeset/21166
A similar thing may apply here, where the cost of memcpy() is greater
than the unaligned performance hit. However, since we now have patches
to avoid unaligned access in the first place
(https://dev.openwrt.org/browser/trunk/target/linux/ar71xx/patches-3.10/902-unaligned_access_hacks.patch),
it might be worth testing a build with these applied and use Florian's
method instead (pass the current SKB to the stack as-is and create a new
one for the next DMA read).
--
Robert Bradley
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel