Saeed Mahameed <sae...@mellanox.com> writes: > On Tue, 2018-10-23 at 12:10 +0200, Toke Høiland-Jørgensen wrote: >> Saeed Mahameed <sae...@mellanox.com> writes: >> >> > On Thu, 2018-10-18 at 23:53 +0200, Toke Høiland-Jørgensen wrote: >> > > Saeed Mahameed <sae...@mellanox.com> writes: >> > > >> > > > I think that the mlx5 driver doesn't know how to tell the other >> > > > device >> > > > to stop transmitting to it while it is resetting.. Maybe tariq >> > > > or >> > > > Jesper know more about this ? >> > > > I will look at this tomorrow after noon and will try to >> > > > repro... >> > > >> > > Hi Saeed >> > > >> > > Did you have a chance to poke at this? :) >> > >> > HI Toke, yes i have been planing to respond but also i wanted to >> > dig >> > more, >> > >> > so the root cause is very clear. >> > >> > 1. core 1 is doing tx_dev->ndo_xdp_xmit() >> > 2. core 2 is doing tx_dev->xdp_set() //remove xdp program. >> >> Right, it was also my guess that it was related to this interaction. >> Thanks for looking into it! >> >> > and the problem is beyond mlx5, since we don't have a way to tell a >> > different core/different netdev to stop xmitting, or at least >> > synchronize with it. >> >> Hmm, ideally there should be some way for the higher level XDP API to >> notice this and abort the call before it even reaches the driver on >> the >> TX side, shouldn't there? At LPC, Jesper and I will be talking about >> a >> proposal for decoupling the ndo_xdp_xmit() resource allocation from >> loading and unloading XDP programs, which I guess could be a way to >> deal >> with this as well. >> >> In the meantime... >> > > Yes totally agree, this is why my fix is temporary. > Good Idea about LPC, let's discuss this there. > >> > I will be waiting for your confirmation that the fix did work. >> >> I tested your patch, and it does indeed fix the crash. However, it >> also >> seems to have the effect that the XDP redirect continues to function >> even after removing the XDP program on the target device. >> >> I.e., after the call to ./xdp_fwd -d $TX_IF, I still see packets >> being >> redirected out $TX_IF. Is this intentional? >> > > Interesting, shouldn't happen, unless there is something weird going on > when running xpd_fwd -d together with xdp_redirect_map, i just checked > the code and if ndo_xdp_set was called with null program we will remove > xdp tx resources, nothing suspicious in the driver. > > I will look at this later this week.
Cool. Let me know if you need anything more from me :) -Toke