> -----Original Message----- > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > Sent: Tuesday, August 1, 2017 1:51 PM > To: Matan Azrad <ma...@mellanox.com> > Cc: dev@dpdk.org; Thomas Monjalon <tho...@monjalon.net>; Olga Shern > <ol...@mellanox.com>; sta...@dpdk.org > Subject: Re: [PATCH] net/mlx4: workaround to verbs wrong error return > > Hi Matan, > > (snipping a bit of unnecessary context) > > On Tue, Aug 01, 2017 at 10:12:29AM +0000, Matan Azrad wrote: > [...] > > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > [...] > > > On Mon, Jul 31, 2017 at 04:56:33PM +0000, Matan Azrad wrote: > [...] > > > > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > [...] > > > > > On Mon, Jul 31, 2017 at 02:15:09PM +0300, Matan Azrad wrote: > [...] > > > > > > @@ -1278,7 +1287,10 @@ struct rte_flow * > > > > > > for (flow = LIST_FIRST(&priv->flows); > > > > > > flow; > > > > > > flow = LIST_NEXT(flow, next)) { > > > > > > - claim_zero(ibv_destroy_flow(flow->ibv_flow)); > > > > > > + /* Current verbs does not allow to check real > > > > > > + * errors when the device was plugged out. > > > > > > + */ > > > > > > + ibv_destroy_flow(flow->ibv_flow); > > > > > > flow->ibv_flow = NULL; > > > > > > DEBUG("Flow %p removed", (void *)flow); > > > > > > } > > > > > > -- > > > > > > 1.8.3.1 > > > > > > > > > > > > > > > > This approach looks way too intrusive. How about making the > > > > > claim_zero() definition not fail but still complain when > > > > > compiled against a broken Verbs version instead? > > > > > > > > > > #include "mlx4_autoconf.h" > > > > > > > > > > [...] > > > > > > > > > > #ifndef HAVE_BROKEN_VERBS > > > > > #define claim_zero(...) assert((__VA_ARGS__) == 0) #else /* > > > > > HAVE_BROKEN_VERBS */ #define claim_zero(...) \ > > > > > (void)(((__VA_ARGS__) == 0) || \ > > > > > DEBUG("Assertion `" # __VA_ARGS__ "' failed > > > > > (IGNORED)")) #endif /* HAVE_BROKEN_VERBS */ > > > > > > > > > > You could use auto-config-h.sh to generate the > HAVE_BROKEN_VERBS > > > > > definition in mlx4_autoconf.h (see mlx4 Makefile) based on some > > > > > symbol, macro or type that only exists or doesn't exist yet in > > > > > problematic releases for instance. > > > > > > > > > > > > > I agree with the dependence on broken verbs but there are other > > > > places in mlx4 code which use claim_zero assertion, So this > > > > suggestion will hurt other validations. > > > > > > Well, half broken is no better than completely broken in my opinion, > > > so while Verbs is being repaired, users debugging the mlx4 PMD will > > > temporarily get debug traces without the ensuing abort(). At least > > > the behavior will be consistent. > > > > > > Think about it, they already have to go out of their way to enable > > > CONFIG_RTE_LIBRTE_MLX4_DEBUG, if they know they aren't using > > > hot-plug but still use a buggy Verbs version, they can disable > > > HAVE_BROKEN_VERBS to revert to the normal behavior. > > > > > > > priv_flow_validate and priv_mac_addr_add functions calls also are > > wrapped by claim_zero, These are not ibv_destroy functions and don't > > depend only in broken verbs, The user want to be aborted in those > > cases otherwise he would have put there trace print as you suggest. > > As the only exceptions, if you had to change something it would be these > instance in order to be less intrusive. But I suggest you not to since, again, > this is a workaround for a problem that is not under PMD control. > > > > > What's about to create new define depend on broken verbs for the > > > > specific > > > assertions? > > > > It will be still intrusive but more accurate. > > > > > > One reason I prefer the code to remain unchanged is that I'm > > > currently refactoring the entire PMD. Maintaining the above patch > > > (picking the right > > > ibv_*() calls that return a consistent value) will be difficult and > > > an intrusive patch won't be reverted easily once Verbs is fixed. > > > > > > > You can just find all claim_zero_new and replace it with claim_zero. > > So what if I'm adding new ibv_destroy_qp() calls, can I expect them to work > consistently or will each of them have to be validated against a possible > assert() failure during hot-plug? > > Note that ibv_destroy_qp() is only one example among many, the work > you've done for this patch will have to be performed every time new code is > added. I don't find it particularly convenient. > > > > All these claim_zero() checks ensure the PMD destroys Verbs > > > resources in the proper order (e.g. a flow before the QP it is > > > associated with). If the return value of any of these cannot be > > > relied on, it's useless to only check some of them. > > > > > > priv_flow_validate and priv_mac_addr_add functions are not destroy > verbs resources. > > Right, and that's not a problem. Unfortunate users still get a nice debugging > message in the unlikely case of a failure for these. >
I don't think the user want debugging message for this functions - he want to be aborted and to stop the program. He could add DEBUG prints if he had want, those behaviors are really different. > > > Moreover if ibv_destroy_something() wrongly returns an error when > > > the device is unplugged, I think this can happen to the calls not > > > part of your patch, i.e. all of them, so working around it at the > > > macro definition level makes sense. > > > > I checked with failsafe tests and found that only the specific destroy > functions are problematic. > > What about other applications and corner cases, such as when the device is > unplugged while the PMD is being initialized? Since the control path is bound > by system calls, the PMD might actually sleep for a non negligible amount of > time (there is really no upper limit to how long) and the device could > disappear at any point. Subsequent ibv_*() calls would return unexpected > errors. > > I'm sure you cannot verify all corner cases, so let's avoid them. > > > > If you don't know what symbol can be relied on in OFED 4.2 to define > > > HAVE_BROKEN_VERBS (which is just an example, you can use another > > > name BTW), maybe you can add a compilation option to enable manually > > > in case of trouble? Something verbose like: > > > > > > CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS_ASSERT=n > > > > > > Which will have to be documented. > > What about the above suggestion? > So, I don't think there is perfect solution to this issue. I will take your suggestion to depend claim_zero in verbs version. Firstly, I will check if I can get the information for broken verbs from somewhere, If not, I will use compilation option. > -- > Adrien Mazarguil > 6WIND