[snip] > The test can be run both with and without ring: > > psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v > psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v -N > > both with and without qdisc bypass ('-q').
Thanks, apologies, I was being inpatient. Started reading the source, saw the tpacket bits and stopped there. > >> - this goes via the tpacket_snd >> which allocs via sock_alloc_send_skb. That results in a non-fragged skb >> as it calls pskb after that with data_len = 0 asking for a contiguous one. > but attached the ring slot as fragments in tpacket_fill_skb. > >> My stuff is using sendmmsg which ends up via packet_snd which allocs >> via sock_alloc_send_pskb which is invoked in a way which always creates >> 2 segments - one for the linear section and one for the rest (and more >> if needed). It is faster than tpacket by the way (several times). >> >> As a comparison tap and other virtual drivers use sock_alloc_send_pskb >> with non-zero data length which results in multiple frags. The code in >> packet_snd is in fact identical with tap (+/- some cosmetic differences). >> >> That is the difference between the tests and that is why your test works >> and mine fails. > All the above test cases work for me, including those that build skbs > with fragments. Could you try those. Tried it, works on all of the adapters and hosts where mine fails. I will step by step hack-in the differences so it behaves same as mine until I find the culprit. This will be tomorrow though, it is late here. The only obvious difference I can see at this point is that I am using iovs and sending the vnet header as iov[0] and the data in pieces after that while your code is doing a send() for the whole frame. This should not make any difference though - it all ends up as an iov internally in the kernel. A. >