Hi Matan, >>>> mlx5 PMD refuses to update link state if link speed is defined but >>>> status is down or if link speed is undefined but status is up, even >>>> if the ioctl() succeeded. >>>> This prevents application to detect link up/down event, especially >>>> when the link speed is not correctly detected. >>> Do you use the wait option? Or no wait? >> We are using the no wait option. > I suggest to call again if failed for N retries time.
Unfortunately it will not solve our problem: if link speed is undefined but the link is up then the test '!dev_link.link_speed && dev_link.link_status' at http://git.dpdk.org/dpdk/tree/drivers/net/mlx5/mlx5_ethdev.c#n899 will always be true and the function will always return EAGAIN. This actually happens in Azure with CX4-Lx VFs. >> What I meant was to let the app decide whether it should retry or not, >> based on the data it gets. >> Right now, the PMD *prevents* the app to get link state if the link >> speed is undefined even if the app does not care about link speed. > In mlx5 this is not the case, we have no one updated and second not - > there are going together: > You can see that we have 2 different system calls: 1 to get up\down and > second to get link speed. > If link speed doesn't appropriate to the link state it may say that > something was changed between the calls and the link status we got from > the first call is not correct anymore. > In this case, we should call both calls again, that’s what we are doing in > "nowait" option. > If the user doesn't want "nowait" option, (means PMD is not allowed to > take more time for response) he should call again when the callback failed > in the time and retries manner the user prefers. Ok, now I understand the logic behind the current behavior: the 2 syscalls being not atomics, you try to detect inconsistencies that way. But if the link speed is undefined, then the state will never be correctly updated. I still believe it is unnecessarily heavy-handed: in most networking application I have seen (and I have 2 examples of current shipping networking products), a missing link speed is not critical whereas link being reported as down means no traffic flowing. Best ben