Found it.
Two bugs canceling each other.
The bind sequence in: psock_txring_vnet.c is wrong.
It does the following addr.sll_protocol = htons(ETH_P_IP);
before calling bind.
If you set addr.sll_protocol to ETH_P_ALL where it should have been in
the first place the test program blows up with -ENOBUFS
I think what is happening is that this value is taken into account when
looking at "what should I use to segment it with" in skb_mac_gso_segment
which is invoked at the end of the verification chain which starts in
packet_direct_xmit in af_packet.c
I have not tried the other test cases like setting it to ETH_P_IP and
giving it IPv6 traffic or the opposite, but my guess is that these will
fail too if they need GSO to be applied.
A.
On 10/12/17 15:12, Anton Ivanov wrote:
On 10/12/17 14:39, Willem de Bruijn wrote:
If I produce a real vnet frame out of a live kernel frame using
virtio_net_hdr_from_skb() and try to send it it fails on the check in
af_packet, while succeeding for tap. If I remove the af_packet check
the
frame is accepted by the hardware too.
If I produce it a synthetic frame + vnet header using the test
program - it
works. Go figure.
Besides looking at the raw frame bytes, also compare the setup
of virtio_net_header, as well as the tcp checksum field. The stack
expects the pseudo header to have already been calculated.
I am feeding it a skb which is coming up in the tx routine of a User
Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that
results in a skb with csum-ed headers, body set up for CSUM_PARTIAL
and multiple fragments (always at least 1 more frag besides the TCP
head).
That has everything in order as expected by virtio_net_hdr_from_skb
and this is what I use to generate the vnet header. It works correctly
for csum and GRO with af_packet and it works correctly for everything
using a tap device. It fails only on GSO + af_packet TX.
What I am doing is the same thing virtio_net does - it just takes the
output of virtio_net_hdr_from_skb and does nothing more. There should
be no need to do anything more :(
It should just work.
Unless there is a gremlin somewhere in the machinery and that gremlin
needs some light to be flushed out.
I am going to continue digging into it.
At the very least I now have a positive test case which uses the same
semantics as my code so I have something to compare to.
Glad to hear that the test is helpful. I wrote it because I
have run into these exact same issues in the past.
It is. I have changes ready for it so it also supports vector IO, need
to finish fighting with it.
A.
--
Anton R. Ivanov
Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/