Hi Olivier, > > Hi Adrien, Konstantin, > > I'm jumping in this (interesting) discussion. I already talked a > bit with Adrien in point to point, and I think its patch is valid. > Please see some comments below. > > > On 06/08/2016 03:57 PM, Adrien Mazarguil wrote: > > On Wed, Jun 08, 2016 at 01:09:18PM +0000, Ananyev, Konstantin wrote: > >>> > >>> Hi Konstantin, > >>> > >>> On Wed, Jun 08, 2016 at 10:34:17AM +0000, Ananyev, Konstantin wrote: > >>>> Hi Adrien, > >>>> > >>>>> > >>>>> An assertion failure occurs in __rte_mbuf_raw_free() (called by a few > >>>>> PMDs) > >>>>> when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting > >>>>> applications with a log level high enough to trigger it. > >>>>> > >>>>> While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free() > >>>>> expects it to be 0. > >>>>> Considering users are not expected to reset the > >>>>> reference count to satisfy assert() and that raw functions are designed > >>>>> on > >>>>> purpose without safety belts, remove these checks. > >>>> > >>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg(). > >>>> Wright now, it is a user responsibility to make sure refcnt==0 before > >>>> pushing > >>>> mbuf back to the pool. > >>>> Not sure why do you consider that wrong? > >>> > >>> I do not consider this wrong and I'm all for using assert() to catch > >>> programming errors, however in this specific case, I think they are > >>> inconsistent and misleading. > >> > >> Honestly, I don't understand why. > >> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt > >> should be equal zero. > > What is the purpose of this? Is there some code relying on this?
The whole current implementation of mbuf_free code path relies on that. Straight here: if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) { m->next = NULL; __rte_mbuf_raw_free(m); } If we'll exclude indirect mbuf logic, all it does: if (rte_mbuf_refcnt_update(m, -1) == 0) { m->next = NULL; __rte_mbuf_raw_free(m); } I.E.: decrement mbuf->refcnt. If new value of refcnt is zero, then put it back into the pool. So having ASERT(mbuf->refcnt==0) inside __rte_mbuf_raw_free()/__rte_mbuf_raw_alloc() looks absolutely valid to me. I *has* to be zero at that point with current implementation, And if it is not then we probably have (or will have) a silent memory corruption. > > >> Yes, as you pointed below - that rule probably can be changed to: > >> when mbuf is in the pool, it's refcnt should equal one, and that would > >> probably allow us > >> to speedup things a bit, but I suppose that's the matter of another > >> aptch/discussion. > > > > Agreed. > > I agree this is somehow another discussion, but let's dive into :) OK :) > > But since [1], it is allowed to call rte_mbuf_raw_alloc() in PMDs (all > PMDs were calling an internal function before). Yes, raw_alloc is public, NP with that. > We could argue that > rte_mbuf_raw_free() should also be made public for PMDs. We could, but right now it is not. Again, as I said, user could use it on his own but he obviously has to obey the rules and do manually what __rte_pktmbuf_prefree_seg() does. At least: rte_mbuf_refcnt_set(m, 0); __rte_mbuf_raw_free(m); > > As you said below, no-one promised that the free() reverts the malloc(), > but given the function names, one can legitimately expect that the > following code is valid: > > m = __rte_mbuf_raw_alloc(); > /* do nothing here */ > __rte_mbuf_raw_free(m); Surely people could (and would) expect various things... But the reality right now is: __rte_mbuf_raw_free() is an internal function and not counterpart of __rte_mbuf_raw_alloc(). If the people don't bother to read API docs or/and the actual code, I can't see how we can help them :) > > If no code relies on having the refcnt set to 0 when a mbuf is in > the pool, I suggest to relax this constraint as Adrien proposed. Why not just rename it to __rte_mbuf_raw_free_dont_use_after_raw_alloc()? To avoid any further confusion :) Seriously speaking I would prefer to leave it as it is. If you feel we have to introduce a counterpart of rte_mbuf_raw_alloc(), we can make a new public one: rte_mbuf_raw_free(stuct rte_mbuf *m) { if (rte_mbuf_refcnt_update(m, -1) == 0) __rte_mbuf_raw_free(m); } > > Then, my opinion is that the refcount should be set to 1 in > rte_pktmbuf_reset(). I don't think we need to update rte_pktmbuf_reset(), it doesn't touch refcnt at all and probably better to keep it that way. To achieve what you suggesting, we probably need to: 1) update _rte_pktmbuf_prefree_seg and rte_pktmbuf_detach() to set refcnt back to 1, i.e: static inline struct rte_mbuf* __attribute__((always_inline)) __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) { __rte_mbuf_sanity_check(m, 0); if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) { /* if this is an indirect mbuf, it is detached. */ if (RTE_MBUF_INDIRECT(m)) rte_pktmbuf_detach(m); + rte_mbuf_refcnt_set(m, 1); return m; } return NULL; } 2) either: a) update mbuf constructor function, so it sets refcnt=1. I suppose that is easy for rte_pktmbuf_init(), but it means that all custom constructors should do the same. Which means possible changes in existing user code and all ABI change related hassle. b) keep rte_mbuf_raw_alloc() setting mbuf->refcnt=1. But then I don't see how will get any performance gain here. So not sure is it really worth it. > And rte_pktmbuf_free() should not be allowed on > an uninitialized mbuf (yes, this would require some changes in PMDs). Not sure I understand you here... free() wouldn not be allowed on mbuf whose recnf==0? But it is not allowed right now anyway. > This would open the door for bulk allocation/free in the mbuf api. Hmm and what stops us having one right now? As I know we do have rte_pktmbuf_alloc_bulk(), I don't see why we can't have rte_pktmbuf_free_bulk() right now. > > This could be done in several steps: > > 1/ remove these assert(), add introduce a public rte_mbuf_raw_free() > 2/ announce that rte_pktmbuf_free() won't work on uninitialized mbufs > in a future version, and ensure that all PMD are inline with this > requirement > 3/ later, move refcount initialization in rte_pktmbuf_reset() > > What do you think? I still think that assert() is on a right place :) *If* we'll change mbuf free code in a way that mbufs inside the pool should have refcnt==1, then I think we'll change it to: RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1); But as I stated above, that change might cause some backward compatibility hassle: 2.a) Or might not give us any performance gain: 2.b). > > Another option is to remove the rte_mbuf_raw_alloc()/rte_mbuf_raw_free() > and use mempool calls. Don't see why we have to remove them... Basically we have a bug in PMD, but instead of fixing it, you guys suggest to change mbuf code. Sounds a bit strange to me. Why not just make for these PMDs to call rte_mempool_put() directly, if you are sure it is safe here? > But having a mbuf wrapper does not seem a bad > thing to me. We can add some extra wrapper then, something like: #define __RTE_MBUF_PUT_FREED(m) (rte_mempool_put((m)->pool, m)) ? Konstantin > > By the way, __rte_pktmbuf_prefree_seg() is also an internal function. > I think we should try to update the mbuf API so that the PMDs do not > need to call these internal functions. > > > [1] > http://dpdk.org/browse/dpdk/commit/?id=fbfd99551ca370266f4bfff58ce441cf5cb1203a > > > Regards, > Olivier > > > > > >>>> If the user calls __rte_mbuf_raw_free() manualy it is his responsibility > >>>> to make > >>>> sure mbuf's refcn==0. > >>> > >>> Sure, however what harm does it cause (besides assert() to fail), since > >>> the > >>> allocation function sets refcount to 1? > >>> > >>> Why having the allocation function set the refcount if we are sure it is > >>> already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) can > >>> surely improve performance. > >> > >> That's' just an assert() enabled when MBUF_DEBUG is on. > >> Its sole purpose is to help troubleshoot the bugs and help to catch > >> situations > >> when someone silently updates mbufs supposed to be free. > > > > I perfectly understand and I cannot agree more with this explanation, > > however the fact these functions are not symmetrical remains an issue that > > needs to be addressed somehow in my opinion. > > > >>>> BTW, why are you doing it? > >>>> The comment clearly states that the function is for internal use: > >>>> /** > >>>> * @internal Put mbuf back into its original mempool. > >>>> * The use of that function is reserved for RTE internal needs. > >>>> * Please use rte_pktmbuf_free(). > >>>> * > >>>> * @param m > >>>> * The mbuf to be freed. > >>>> */ > >>>> static inline void __attribute__((always_inline)) > >>>> __rte_mbuf_raw_free(struct rte_mbuf *m) > >>> > >>> Several PMDs are using it anyway (won't name names, but I know one of them > >>> quite well). > >> > >> Then it probably is a bug in these PMDs that need to be fixed. > >> > >>> I chose to modify this code instead of its users for the > >>> following reasons: > >>> > >>> - Considering their names, these functions should be opposites and able to > >>> work together like malloc()/free(). > >> > >> These are internal functions. > >> Comments in mbuf clearly state that library users shouldn't call them > >> directly. > >> They are written to fit internal librte_mbuf needs, and no-one ever > >> promised > >> malloc/free() compatibility here. > > > > So it cannot be provided for the sake of not providing it or is there a good > > reason? > > > > What I meant is that since PMDs already made the mistake of using these > > functions to benefit from the improved performance, DPDK being all about > > performance and stuff, let them use it as intended. Perhaps we should drop > > those "__" like for rte_mbuf_raw_alloc(). > > > >>> > >>> - PMDs are relying on these functions for performance reasons, we can > >>> assume > >>> they took the extra care necessary to make sure it would work properly. > >> > >> That just doesn't seem correct to me. > >> The proper way to do free fo mbuf segment is: > >> > >> static inline void __attribute__((always_inline)) > >> rte_pktmbuf_free_seg(struct rte_mbuf *m) > >> { > >> if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) { > >> m->next = NULL; > >> __rte_mbuf_raw_free(m); > >> } > >> } > >> > >> If by some reason you choose not to use this function, then it is your > >> responsibility to perform similar actions on your own before pushing mbuf > >> into the pool. > >> That's what some TX functions for some Intel NICs do to improve > >> performance: > >> they call _prefree_seg() manually and try to put mbufs into the pool in > >> groups. > > > > Not anymore it seems, but in the current code base both ena and mpipe PMDs > > (soon mlx5 as well) seem to get this wrong. > > > >>> - Preventing it would make these PMDs slower and is not acceptable either. > >> > >> I can hardly imagine that __rte_pktmbuf_prefree_seg() impact would be that > >> severe... > >> But ok, probably you do have some very specific case, but then why you > >> PMD just doesn't call: > >> rte_mempool_put(m->pool, m); > >> directly? > > > > To survive the upcoming transition to mbufs backed by libc malloc() without > > having to fix them? Joke aside, I guess the reason is to use functions with > > "mbuf" in their names when dealing with mbufs. > > > >> Why instead you choose to change common functions and compromise > >> librte_mbuf debug ability? > > > > No, I'm fine with keeping the debug ability, however I did not find a way to > > both keep it and fix the consistency issue without breaking something > > (performance or refcount assumptions I'm not familiar with elsewhere). > > > >>> What remains is the consistency issue, I think these statements were only > >>> added to catch multiple frees, > >> > >> Yes these asserts() here to help catch bugs, > >> and I think it is a good thing to have them here. > >> > >>> and those should be caught at a higher > >>> level, where other consistency checks are also performed. > >> > >> Like where? > > > > Possibly rte_pktmbuf_free_seg(). > > > >>>>> Signed-off-by: Adrien Mazarguil <adrien.mazarguil at 6wind.com> > >>>>> --- > >>>>> lib/librte_mbuf/rte_mbuf.h | 2 -- > >>>>> 1 file changed, 2 deletions(-) > >>>>> > >>>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > >>>>> index 11fa06d..7070bb8 100644 > >>>>> --- a/lib/librte_mbuf/rte_mbuf.h > >>>>> +++ b/lib/librte_mbuf/rte_mbuf.h > >>>>> @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf > >>>>> *rte_mbuf_raw_alloc(struct rte_mempool *mp) > >>>>> if (rte_mempool_get(mp, &mb) < 0) > >>>>> return NULL; > >>>>> m = (struct rte_mbuf *)mb; > >>>>> - RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0); > >>>>> rte_mbuf_refcnt_set(m, 1); > >>>>> __rte_mbuf_sanity_check(m, 0); > >>>>> > >>>>> @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp) > >>>>> static inline void __attribute__((always_inline)) > >>>>> __rte_mbuf_raw_free(struct rte_mbuf *m) > >>>>> { > >>>>> - RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0); > >>>>> rte_mempool_put(m->pool, m); > >>>>> } > >>>>> > >>>>> -- > >>>>> 2.1.4 > >>>> > >>> > >>> -- > >>> Adrien Mazarguil > >>> 6WIND > >