On Tue, Nov 27, 2018 at 1:41 PM Maxim Mikityanskiy <maxi...@mellanox.com> wrote: > > Hi everyone, > > We are experiencing an issue with Mellanox mlx5 driver, and I tracked it down > to > the packet_snd function in net/packet/af_packet.c. > > Brief description: when a socket is created by calling `socket(AF_PACKET, > SOCK_RAW, 0)`, the mlx5 driver receives an skb with wrong transport_offset, > which can confuse the driver and cause the transmit to fail (depending on the > configuration of the NIC). > > The flow is the following: > > 1. packet_snd is called. > > 2. dev->hard_header_len (which is 14) is assigned to reserve. > > 3. The value of the third parameter of the initial socket() call is assigned > to > skb->protocol. In our case, it's 0. > > 4. skb_probe_transport_header is called with offset_hint == reserve (which is > 14). > > 5. __skb_flow_dissect fails, because skb->protocol is 0. > > 6. skb_probe_transport_header happily sets transport_header to 14. > > I find this behavior (defaulting to 14) strange, because network_header is > also > set to 14, and the transport_header value is just wrong. Moreover, there are > two > more calls to skb_probe_transport_header in this file with offset_hint == 0, > which looks more reasonable (if we can't find the transport header, we > indicate > that there is none, instead of pointing to the network header).
That is not what offset_hint 0 does. It also sets the transport header to the same as the network header. The difference with reserve is whether skb->data is pointing at the link layer or network header at the time (SOCK_RAW vs SOCK_DGRAM). Indicating that transport offset is not set would be setting it to ~0U. Perhaps that is indeed a better choice in these paths when skb_flow_dissect_keys_basic fails to parse the headers. > Does anyone know why offset_hint is set to 14 in this single place? Can it be > replaced by 0 safely, and what can be the consequences? > > Also, what guarantees does kernel provide for the network and transport header > offsets? Especially in raw sockets, where the headers are not generated by > different stack layers. >From the above, this appears to be best effort. Note that the same is also used by tuntap and a few others.