On Wed, Nov 28, 2018 at 3:10 AM Maxim Mikityanskiy <maxi...@mellanox.com> wrote: > > Hi Saeed, > > > Can you elaborate more, what NIC? what configuration ? what do you mean > > by confusion, anyway please see below > > ConnectX-4, after running `mlnx_qos -i eth1 --trust dscp`, which sets inline > mode 2 (MLX5_INLINE_MODE_IP). I'll explain what I mean by confusion below. > > > in mlx5 with ConnectX4 or Connext4-LX there is a requirement to copy at > > least the ethernet header to the tx descriptor otherwise this might > > cause the packet to be dropped, and for RAW sockets the skb headers > > offsets are not set, but the latest mlx5 upstream driver would know how > > to handle this, and copy the minmum amount required > > please see: > > > > static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode, > > struct sk_buff *skb) > > Yes, I know that, and what I do is debugging an issue with this function. > > > > > it should default to: > > > > > > case MLX5_INLINE_MODE_L2: > > default: > > hlen = mlx5e_skb_l2_header_offset(skb); > > The issue appears in MLX5_INLINE_MODE_IP. I haven't tested > MLX5_INLINE_MODE_TCP_UDP yet, though. > > > So it should return at least 18 and not 14. > > Yes, the function does its best to return at least 18, but it silently expects > skb_transport_offset to exceed 18. In normal conditions, it will be more that > 18, because it will be at least 14 + 20. But in my case, when I send a packet > via an AF_PACKET socket, skb_transport_offset returns 14 (which is nonsense), > and the driver uses this value, causing the hardware to fail, because it's > less > than 18. >
Got it, so even if you copy 18 it is not sufficient ! if the packet is ipv4 or ipv6 and the inline mode is set to MLX5_INLINE_MODE_IP in the vport context you must copy the IP headers as well ! but what do you expect from AF_PACKET socket ? to parse each and every packet and set skb_transport_offset ? > > We had some issues with this in old driver such as kernels 4.14/15, and > > it depends in the use case so i need some information first: > > No, it's not an old kernel. We actually have this bug in our internal bug > tracking system, and I'm trying to resolve it. > > > 1. What Cards do you have ? (lspci) > > 03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] > 03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] > 81:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 > Pro] > > Testing with ConnectX-4. > > > 2. What kernel/driver version are you using ? > > I'm on net-next-mlx5, commit 66a4b5ef638a (the latest when I started the > investigation). > > > 3. what is the current enum mlx5_inline_modes seen in > > mlx5e_calc_min_inline or sq->min_inline_mode ? > > MLX5_INLINE_MODE_IP, as I said above. > > > 4. Firmware version ? (ethtool -i) > > 12.22.0238 (MT_2190110032) > > > can you share the packet format you are sending and seeing the bad > > behavior with > > Here is the hexdump of the simplest packet that causes the problem when it's > sent through AF_PACKET after `mlnx_qos -i eth1 --trust dscp`: > > 00000000: 11 22 33 44 55 66 77 88 99 aa bb cc 08 00 45 00 > 00000010: 00 20 00 00 40 00 40 11 ae a5 c6 12 00 01 c6 12 > 00000020: 00 02 00 00 4a 38 00 0c 29 82 61 62 63 64 > > (Please ignore the wrong UDP checksum and non-existing MACs, it doesn't matter > at all, I tested it with completely valid packets as well. The wrong UDP > checksum is due to a bug in our internal pypacket utility). > > Thanks, > Max