On 9/6/19 11:13 AM, Michael Marley wrote:
(This is also reported at
https://bugzilla.kernel.org/show_bug.cgi?id=204551, but it was
recommended that I send it to this list as well.)
I have a put together a router that routes traffic from several local
subnets from a switch attached to an i82599ES card through an IPSec
VPN interface set up with StrongSwan. (The VPN is running on an
unrelated second interface with a different driver.) Traffic from the
local interfaces to the VPN works as it should and eventually makes it
through the VPN server and out to the Internet. The return traffic
makes it back to the router and tcpdump shows it leaving by the
i82599, but the traffic never actually makes it onto the wire and I
instead get one of
enp1s0: ixgbe_ipsec_tx: bad sa_idx=64512 handle=0
for each packet that should be transmitted. (The sa_idx and handle
values are always the same.)
I realized this was probably related to ixgbe's IPSec offloading
feature, so I tried with the motherboard's integrated e1000e device
and didn't have the problem. I tried using ethtool to disable all the
IPSec-related offloads (tx-esp-segmentation, esp-hw-offload,
esp-tx-csum-hw-offload), but the problem persisted. I then tried
recompiling the kernel with CONFIG_IXGBE_IPSEC=n and that worked
around the problem.
I was also able to find another instance of the same problem reported
in Debian at
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=930443. That person
seems to be having exactly the same issue as me, down to the sa_idx
and handle values being the same.
If there are any more details I can provide to make this easier to
track down, please let me know.
Thanks,
Michael Marley
Hi Michael,
Thanks for pointing this out. The issue this error message is
complaining about is that the handle given to the driver is a bad
value. The handle is what helps the driver find the right encryption
information, and in this case is an index into an array, one array for
Rx and one for Tx, each of which have up to 1024 entries. In order to
encode them into a single value, 1024 is added to the Tx values to make
the handle, and 1024 is subtracted to use the handle later. Note that
the bad sa_idx is 64512, which happens to also be -1024; if the Tx
handle given to ixgbe for xmit is 0, we subtract 1024 from that and get
this bad sa_idx value.
That handle is supposed to be an opaque value only used by the driver.
It looks to me like either (a) the driver is not setting up the handle
correctly when the SA is first set up, or (b) something in the upper
levels of the ipsec code is clearing the handle value. We would need to
know more about all the bits in your SA set up to have a better idea
what parts of the ipsec code are being exercised when this problem happens.
I currently don't have access to a good ixgbe setup on which to
test/debug this, and I haven't been paying much attention lately to
what's happening in the upper ipsec layers, so my help will be somewhat
limited. I'm hoping the the Intel folks can add a little help, so I've
copied Jeff Kirsher on this (they'll probably point back to me since I
wrote this chunk :-) ). I've also copied Stephen Klassert for his ipsec
thoughts.
In the meantime, can you give more details on the exact ipsec rules that
are used here, and are there any error messages coming from ixgbe
regarding the ipsec rule setup that might help us identify what's happening?
Thanks,
sln