Hi, Slava > -----邮件原件----- > 发件人: Slava Ovsiienko <viachesl...@nvidia.com> > 发送时间: 2021年5月13日 18:49 > 收件人: Feifei Wang <feifei.wa...@arm.com>; Matan Azrad > <ma...@nvidia.com>; Shahaf Shuler <shah...@nvidia.com> > 抄送: dev@dpdk.org; nd <n...@arm.com>; sta...@dpdk.org; Ruifeng Wang > <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd <n...@arm.com> > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache > > Hi, Feifei > > .. snip.. > > > > I can understand you worry that if there is no 'wmb at last', when > > agent_1 leave the locked section, agent_2 still may observe unchanged > global cache. > > > > However, when agent_2 take a lock and get(new MR) in time slot > > 7(Fig.1), it means > > agent_1 has finished updating global cache and lock is freed. > > Besides, if agent_2 can take a lock, it also shows agent_2 has > > observed the changed global cache. > > > > This is because there is a store- release in rte_rwlock_read_unlock, > > store- release ensures all store operation before 'release' can be > > committed if store operation after 'release' is observed by other agents. > > OK, I missed this implicit ordering hidden in the lock/unlock(), thank you for > pointing me out that. > > I checked the listings for rte_rwlock_write_unlock(), it is implemented with > __atomic_store_n(&rwl->cnt, 0, __ATOMIC_RELEASE) and: > - on x86 there is nothing special in the code due to strict x86-64 ordering > model > - on ARMv8 there is stlxr instruction, that implies the write barrier > > Now you convinced me 😊, we can get rid of the explicit "wmb_at_last" in > the code. > Please, provide the patch with clear commit message with details above, it is > important to mention: > - lock protection is involved, dev_gen and cache update ordering inside the > protected section does not matter > - unlock() provides implicit write barrier due to atomic operation > _ATOMIC_RELEASE
Ok, I will be careful with the commit message and include above details in it. > > Also, in my opinion, there should be small comment in place of wmb being > revomed, reminding we are in the protected section and unlock provides the > implicit write barrier at the level visible by software. > Yes, it is necessary to add small comment in the place of wmb to explain why we do this. > Thank you for the meaningful discussion, I refreshed my knowledge of > memory ordering models for different architectures 😊 > Also thanks very much for this discussion, I think I learn a lot from it and enjoy exploring the unknown problems. Looking forward to your review about the next version, thanks a lot 😊. Best Regards Feifei > With best regards, > Slava > > > > > > > > > > > > Best Regards > > > > Feifei > > > > > > > > > With best regards, > > > > > Slava > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > Feifei > > > > > > > > > > > > > -----邮件原件----- > > > > > > > 发件人: Slava Ovsiienko <viachesl...@nvidia.com> > > > > > > > 发送时间: 2021年5月7日 18:15 > > > > > > > 收件人: Feifei Wang <feifei.wa...@arm.com>; Matan Azrad > > > > > > > <ma...@nvidia.com>; Shahaf Shuler <shah...@nvidia.com> > > > > > > > 抄送: dev@dpdk.org; nd <n...@arm.com>; sta...@dpdk.org; > > Ruifeng > > > > > Wang > > > > > > > <ruifeng.w...@arm.com>; nd <n...@arm.com> > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory > > > > > > > Region cache > > > > > > > > > > > > > > Hi, Feifei > > > > > > > > > > > > > > We should consider the locks in your scenario - it is > > > > > > > crucial for the complete model description: > > > > > > > > > > > > > > How agent_1 (in your terms) rebuilds global cache: > > > > > > > > > > > > > > 1a) lock() > > > > > > > 1b) rebuild(global cache) > > > > > > > 1c) update(dev_gen) > > > > > > > 1d) wmb() > > > > > > > 1e) unlock() > > > > > > > > > > > > > > How agent_2 checks: > > > > > > > > > > > > > > 2a) check(dev_gen) (assume positive - changed) > > > > > > > 2b) clear(local_cache) > > > > > > > 2c) miss(on empty local_cache) - > eventually it goes to > > > > > > > mr_lookup_caches() > > > > > > > 2d) lock() > > > > > > > 2e) get(new MR) > > > > > > > 2f) unlock() > > > > > > > 2g) update(local cache with obtained new MR) > > > > > > > > > > > > > > Hence, even if 1c) becomes visible in 2a) before 1b) > > > > > > > committed (say, due to out-of-order Arch) - the agent 2 > > > > > > > would be blocked on > > > > > > > 2d) and scenario depicted on your Fig2 would not happen > > > > > > > (agent_2 will wait before step 3 till agent 1 unlocks after its > > > > > > > step 5). > > > > > > > > > > > > > > With best regards, > > > > > > > Slava > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Feifei Wang <feifei.wa...@arm.com> > > > > > > > > Sent: Friday, May 7, 2021 9:36> To: Slava Ovsiienko > > > > > > > > <viachesl...@nvidia.com>; Matan Azrad <ma...@nvidia.com>; > > > > Shahaf > > > > > > > > Shuler <shah...@nvidia.com> > > > > > > > > Cc: dev@dpdk.org; nd <n...@arm.com>; sta...@dpdk.org; > > > > > > > > Ruifeng > > > > > Wang > > > > > > > > <ruifeng.w...@arm.com>; nd <n...@arm.com> > > > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for > > > > > > > > Memory Region cache > > > > > > > > > > > > > > > > Hi, Slava > > > > > > > > > > > > > > > > Thanks very much for your reply. > > > > > > > > > > > > > > > > > -----邮件原件----- > > > > > > > > > 发件人: Slava Ovsiienko <viachesl...@nvidia.com> > > > > > > > > > 发送时间: 2021年5月6日 19:22 > > > > > > > > > 收件人: Feifei Wang <feifei.wa...@arm.com>; Matan Azrad > > > > > > > > > <ma...@nvidia.com>; Shahaf Shuler <shah...@nvidia.com> > > > > > > > > > 抄送: dev@dpdk.org; nd <n...@arm.com>; sta...@dpdk.org; > > > > Ruifeng > > > > > > > Wang > > > > > > > > > <ruifeng.w...@arm.com>; nd <n...@arm.com> > > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for > > > > > > > > > Memory Region cache > > > > > > > > > > > > > > > > > > Hi, Feifei > > > > > > > > > > > > > > > > > > Sorry, I do not follow why we should get rid of the last > > > > > > > > > (after dev_gen update) wmb. > > > > > > > > > We've rebuilt the global cache, we should notify other > > > > > > > > > agents it's happened and they should flush local caches. > > > > > > > > > So, dev_gen change should be made visible to other > > > > > > > > > agents to trigger this activity and the second wmb is here to > ensure this. > > > > > > > > > > > > > > > > 1. For the first problem why we should get rid of the last > > > > > > > > wmb and move it before dev_gen updated, I think our > > > > > > > > attention is how the wmb implements the synchronization > > > > > > > > between multiple > > > agents. > > > > > > > > Fig1 > > > > > > > > ---------------------------------------------------------- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > ------------------------ > > > > > > > > ------- > > > > > > > > Timeslot agent_1 > agent_2 > > > > > > > > 1 rebuild global cache > > > > > > > > 2 wmb > > > > > > > > 3 update dev_gen > > > > > > > > ----------------------- load > > > changed > > > > > > > > dev_gen > > > > > > > > 4 > > > > > > > > rebuild local > > > > cache > > > > > > > > ---------------------------------------------------------- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > ------------------------ > > > > > > > > ------- > > > > > > > > > > > > > > > > First, wmb is only for local thread to keep the order > > > > > > > > between local > > > > > > > > write- write : > > > > > > > > Based on the picture above, for agent_1, wmb keeps the > > > > > > > > order that rebuilding global cache is always before updating > dev_gen. > > > > > > > > > > > > > > > > Second, agent_1 communicates with agent_2 by the global > > > > > > > > variable "dev_gen" : > > > > > > > > If agent_1 updates dev_gen, agent_2 will load it and then > > > > > > > > it knows it should rebuild local cache > > > > > > > > > > > > > > > > Finally, agent_2 rebuilds local cache according to whether > > > > > > > > agent_1 has rebuilt global cache, and agent_2 knows this > > > > > > > > information by the variable > > > > > > > "dev_gen". > > > > > > > > Fig2 > > > > > > > > ---------------------------------------------------------- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > ------------------------ > > > > > > > > ------- > > > > > > > > Timeslot agent_1 > agent_2 > > > > > > > > 1 update dev_gen > > > > > > > > 2 load > > > > > > > > changed > > > > dev_gen > > > > > > > > 3 > > > > > > > > rebuild local > > > > cache > > > > > > > > 4 rebuild global cache > > > > > > > > 5 wmb > > > > > > > > ---------------------------------------------------------- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > -- > > > > > > > > ------------------------ > > > > > > > > ------- > > > > > > > > > > > > > > > > However, in arm platform, if wmb is after dev_gen updated, > > > > "dev_gen" > > > > > > > > may be updated before agent_1 rebuilding global cache, > > > > > > > > then > > > > > > > > agent_2 maybe receive error message and rebuild its local > > > > > > > > cache in > > > > > advance. > > > > > > > > > > > > > > > > To summarize, it is not important which time other agents > > > > > > > > can see the changed global variable "dev_gen". > > > > > > > > (Actually, wmb after "dev_gen" cannot ensure changed > > "dev_gen" > > > > > > > > is committed to the global). > > > > > > > > It is more important that if other agents see the changed > > > > > > > > "dev_gen", they also can know global cache has been updated. > > > > > > > > > > > > > > > > > One more point, due to registering new/destroying > > > > > > > > > existing MR involves FW (via kernel) calls, it takes so > > > > > > > > > many CPU cycles that we could neglect wmb overhead at all. > > > > > > > > > > > > > > > > We just move the last wmb into the right place, and not > > > > > > > > delete it for performance. > > > > > > > > > > > > > > > > > > > > > > > > > > Also, regarding this: > > > > > > > > > > > > > > > > > > > > Another question suddenly occurred to me, in order > > > > > > > > > to keep the > > > > > > > > > > > > > > > > > > > > order that rebuilding global cache before updating > > > > > > > > > > ”dev_gen“, the > > > > > > > > > > > wmb should be before updating "dev_gen" rather than > > > > > > > > > > > after > > it. > > > > > > > > > > > Otherwise, in the out-of-order platforms, current > > > > > > > > > order cannot be > > > > > > > > kept. > > > > > > > > > > > > > > > > > > it is not clear why ordering is important - global cache > > > > > > > > > update and dev_gen change happen under spinlock > > > > > > > > > protection, so only the last wmb is meaningful. > > > > > > > > > > > > > > > > > > > > > > > > > 2. The second function of wmb before "dev_gen" updated is > > > > > > > > for performance according to our previous discussion. > > > > > > > > According to Fig2, if there is no wmb between "global > > > > > > > > cache > > > updated" > > > > > > > > and "dev_gen updated", "dev_gen" may update before global > > > > > > > > cache > > > > > > > updated. > > > > > > > > > > > > > > > > Then agent_2 may see the changed "dev_gen" and flush > > > > > > > > entire local cache in advance. > > > > > > > > > > > > > > > > This entire flush can degrade the performance: > > > > > > > > "the local cache is getting empty and can't provide > > > > > > > > translation for other valid (not being removed) MRs, and > > > > > > > > the translation has to look up in the global cache, that > > > > > > > > is locked now for rebuilding, this causes the delays in > > > > > > > > data path on acquiring global > > > > cache lock." > > > > > > > > > > > > > > > > Furthermore, spinlock is just for global cache, not for > > > > > > > > dev_gen and local cache. > > > > > > > > > > > > > > > > > To summarize, in my opinion: > > > > > > > > > - if you see some issue with ordering of global cache > > > > > > > > > update/dev_gen signalling, > > > > > > > > > could you, please, elaborate? I'm not sure we should > > > > > > > > > maintain an order (due to spinlock protection) > > > > > > > > > - the last rte_smp_wmb() after dev_gen incrementing > > > > > > > > > should be kept intact > > > > > > > > > > > > > > > > > > > > > > > > > At last, for my view, there are two functions that moving > > > > > > > > wmb before "dev_gen" > > > > > > > > for the write-write order: > > > > > > > > -------------------------------- > > > > > > > > a) rebuild global cache; > > > > > > > > b) rte_smp_wmb(); > > > > > > > > c) updating dev_gen > > > > > > > > -------------------------------- 1. Achieve > > > > > > > > synchronization between multiple threads in the right way 2. > > > > > > > > Prevent other agents from flushing local cache early to > > > > > > > > ensure performance > > > > > > > > > > > > > > > > Best Regards > > > > > > > > Feifei > > > > > > > > > > > > > > > > > With best regards, > > > > > > > > > Slava > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: Feifei Wang <feifei.wa...@arm.com> > > > > > > > > > > Sent: Thursday, May 6, 2021 5:52 > > > > > > > > > > To: Slava Ovsiienko <viachesl...@nvidia.com>; Matan > > > > > > > > > > Azrad <ma...@nvidia.com>; Shahaf Shuler > > > > > > > > > > <shah...@nvidia.com> > > > > > > > > > > Cc: dev@dpdk.org; nd <n...@arm.com>; sta...@dpdk.org; > > > > > > > > > > Ruifeng > > > > > > > Wang > > > > > > > > > > <ruifeng.w...@arm.com>; nd <n...@arm.com> > > > > > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug > > > > > > > > > > for Memory Region cache > > > > > > > > > > > > > > > > > > > > Hi, Slava > > > > > > > > > > > > > > > > > > > > Would you have more comments about this patch? > > > > > > > > > > For my sight, only one wmb before "dev_gen" updating > > > > > > > > > > is enough to synchronize. > > > > > > > > > > > > > > > > > > > > Thanks very much for your attention. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > Feifei > > > > > > > > > > > > > > > > > > > > > -----邮件原件----- > > > > > > > > > > > 发件人: Feifei Wang > > > > > > > > > > > 发送时间: 2021年4月20日 16:42 > > > > > > > > > > > 收件人: Slava Ovsiienko <viachesl...@nvidia.com>; > Matan > > > > Azrad > > > > > > > > > > > <ma...@nvidia.com>; Shahaf Shuler > > > > > > > > > > > <shah...@nvidia.com> > > > > > > > > > > > 抄送: dev@dpdk.org; nd <n...@arm.com>; > sta...@dpdk.org; > > > > > > Ruifeng > > > > > > > > > Wang > > > > > > > > > > > <ruifeng.w...@arm.com>; nd <n...@arm.com> > > > > > > > > > > > 主题: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for > > > > > > > > > > > Memory > > > > > > > > Region > > > > > > > > > > > cache > > > > > > > > > > > > > > > > > > > > > > Hi, Slava > > > > > > > > > > > > > > > > > > > > > > I think the second wmb can be removed. > > > > > > > > > > > As I know, wmb is just a barrier to keep the order > > > > > > > > > > > between write and > > > > > > > > > write. > > > > > > > > > > > and it cannot tell the CPU when it should commit the > > changes. > > > > > > > > > > > > > > > > > > > > > > It is usually used before guard variable to keep the > > > > > > > > > > > order that updating guard variable after some > > > > > > > > > > > changes, which you want to release, > > > > > > > > > > have been done. > > > > > > > > > > > > > > > > > > > > > > For example, for the wmb after global cache > > > > > > > > > > > update/before altering dev_gen, it can ensure the > > > > > > > > > > > order that updating global cache before altering > > > > > > > > > > > dev_gen: > > > > > > > > > > > 1)If other agent load the changed "dev_gen", it can > > > > > > > > > > > know the global cache has been updated. > > > > > > > > > > > 2)If other agents load the unchanged, "dev_gen", it > > > > > > > > > > > means the global cache has not been updated, and the > > > > > > > > > > > local cache will not be > > > > > > > > flushed. > > > > > > > > > > > > > > > > > > > > > > As a result, we use wmb and guard variable "dev_gen" > > > > > > > > > > > to ensure the global cache updating is "visible". > > > > > > > > > > > The "visible" means when updating guard variable > > "dev_gen" > > > > > > > > > > > is known by other agents, they also can confirm > > > > > > > > > > > global cache has been updated in the meanwhile. > > > > > > > > > > > Thus, just one wmb before altering dev_gen can > > > > > > > > > > > ensure > > > > > > > > > > this. > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > Feifei > > > > > > > > > > > > > > > > > > > > > > > -----邮件原件----- > > > > > > > > > > > > 发件人: Slava Ovsiienko <viachesl...@nvidia.com> > > > > > > > > > > > > 发送时间: 2021年4月20日 15:54 > > > > > > > > > > > > 收件人: Feifei Wang <feifei.wa...@arm.com>; Matan > > > Azrad > > > > > > > > > > > > <ma...@nvidia.com>; Shahaf Shuler > > > > > > > > > > > > <shah...@nvidia.com> > > > > > > > > > > > > 抄送: dev@dpdk.org; nd <n...@arm.com>; > > sta...@dpdk.org; > > > > > > > Ruifeng > > > > > > > > > > Wang > > > > > > > > > > > > <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd > > > > > > > <n...@arm.com>; > > > > > > > > > nd > > > > > > > > > > > > <n...@arm.com> > > > > > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug > > > > > > > > > > > > for Memory Region cache > > > > > > > > > > > > > > > > > > > > > > > > Hi, Feifei > > > > > > > > > > > > > > > > > > > > > > > > In my opinion, there should be 2 barriers: > > > > > > > > > > > > - after global cache update/before altering > > > > > > > > > > > > dev_gen, to ensure the correct order > > > > > > > > > > > > - after altering dev_gen to make this change > > > > > > > > > > > > visible for other agents and to trigger local > > > > > > > > > > > > cache update > > > > > > > > > > > > > > > > > > > > > > > > With best regards, Slava > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > > > From: Feifei Wang <feifei.wa...@arm.com> > > > > > > > > > > > > > Sent: Tuesday, April 20, 2021 10:30 > > > > > > > > > > > > > To: Slava Ovsiienko <viachesl...@nvidia.com>; > > > > > > > > > > > > > Matan Azrad <ma...@nvidia.com>; Shahaf Shuler > > > > > > > > > > > > > <shah...@nvidia.com> > > > > > > > > > > > > > Cc: dev@dpdk.org; nd <n...@arm.com>; > > > > > > > > > > > > > sta...@dpdk.org; Ruifeng > > > > > > > > > > Wang > > > > > > > > > > > > > <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd > > > > > > > > <n...@arm.com>; > > > > > > > > > > nd > > > > > > > > > > > > > <n...@arm.com> > > > > > > > > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix > > > > > > > > > > > > > rebuild bug for Memory Region cache > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Slava > > > > > > > > > > > > > > > > > > > > > > > > > > Another question suddenly occurred to me, in > > > > > > > > > > > > > order to keep the order that rebuilding global > > > > > > > > > > > > > cache before updating ”dev_gen“, the wmb should > > > > > > > > > > > > > be before updating > > > > > "dev_gen" > > > > > > > > > > > > > rather > > > > > > > than after it. > > > > > > > > > > > > > Otherwise, in the out-of-order platforms, > > > > > > > > > > > > > current order cannot be > > > > > > > > > kept. > > > > > > > > > > > > > > > > > > > > > > > > > > Thus, we should change the code as: > > > > > > > > > > > > > a) rebuild global cache; > > > > > > > > > > > > > b) rte_smp_wmb(); > > > > > > > > > > > > > c) updating dev_gen > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > Feifei > > > > > > > > > > > > > > -----邮件原件----- > > > > > > > > > > > > > > 发件人: Feifei Wang > > > > > > > > > > > > > > 发送时间: 2021年4月20日 13:54 > > > > > > > > > > > > > > 收件人: Slava Ovsiienko <viachesl...@nvidia.com>; > > > > Matan > > > > > > > Azrad > > > > > > > > > > > > > > <ma...@nvidia.com>; Shahaf Shuler > > > > > > > > > > > > > > <shah...@nvidia.com> > > > > > > > > > > > > > > 抄送: dev@dpdk.org; nd <n...@arm.com>; > > > > sta...@dpdk.org; > > > > > > > > > Ruifeng > > > > > > > > > > > > Wang > > > > > > > > > > > > > > <ruifeng.w...@arm.com>; nd <n...@arm.com>; nd > > > > > > > > <n...@arm.com> > > > > > > > > > > > > > > 主题: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild > > > > > > > > > > > > > > bug for Memory > > > > > > > > > > > Region > > > > > > > > > > > > > > cache > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Slava > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks very much for your explanation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I can understand the app can wait all mbufs > > > > > > > > > > > > > > are returned to the memory pool, and then it > > > > > > > > > > > > > > can free this mbufs, I agree with > > > > > > > > this. > > > > > > > > > > > > > > > > > > > > > > > > > > > > As a result, I will remove the bug fix patch > > > > > > > > > > > > > > from this series and just replace the smp > > > > > > > > > > > > > > barrier with > > > > > > > > > > > > > > C11 thread fence. Thanks very much for your > > > > > > > > > > > > > > patient explanation > > > > > again. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > Feifei > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----邮件原件----- > > > > > > > > > > > > > > > 发件人: Slava Ovsiienko > > > > > > > > > > > > > > > <viachesl...@nvidia.com> > > > > > > > > > > > > > > > 发送时间: 2021年4月20日 2:51 > > > > > > > > > > > > > > > 收件人: Feifei Wang <feifei.wa...@arm.com>; > > Matan > > > > > > Azrad > > > > > > > > > > > > > > > <ma...@nvidia.com>; Shahaf Shuler > > > > > > > > > > > > > > > <shah...@nvidia.com> > > > > > > > > > > > > > > > 抄送: dev@dpdk.org; nd <n...@arm.com>; > > > > > sta...@dpdk.org; > > > > > > > > > > Ruifeng > > > > > > > > > > > > > Wang > > > > > > > > > > > > > > > <ruifeng.w...@arm.com>; nd <n...@arm.com> > > > > > > > > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild > > > > > > > > > > > > > > > bug for Memory Region cache > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Feifei > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please, see below > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > .... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Feifei > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, I do not follow what this patch fixes. > > > > > > > > > > > > > > > > > Do we have some issue/bug with MR cache > > > > > > > > > > > > > > > > > in > > > > practice? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch fixes the bug which is based on > > > > > > > > > > > > > > > > logical deduction, and it doesn't actually > > > > > > > > > > > > > > > > happen. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Each Tx queue has its own dedicated "local" > > > > > > > > > > > > > > > > > cache for MRs to convert buffer address > > > > > > > > > > > > > > > > > in mbufs being transmitted to LKeys > > > > > > > > > > > > > > > > > (HW-related entity > > > > > > > > > > > > > > > > > handle) and the "global" cache for all > > > > > > > > > > > > > > > > > MR registered on the > > > > > > > > > > device. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > AFAIK, how conversion happens in datapath: > > > > > > > > > > > > > > > > > - check the local queue cache flush > > > > > > > > > > > > > > > > > request > > > > > > > > > > > > > > > > > - lookup in local cache > > > > > > > > > > > > > > > > > - if not found: > > > > > > > > > > > > > > > > > - acquire lock for global cache read > > > > > > > > > > > > > > > > > access > > > > > > > > > > > > > > > > > - lookup in global cache > > > > > > > > > > > > > > > > > - release lock for global cache > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > How cache update on memory > > > > > > > > > > > > > > > > > freeing/unregistering > > > > > > > > happens: > > > > > > > > > > > > > > > > > - acquire lock for global cache write > > > > > > > > > > > > > > > > > access > > > > > > > > > > > > > > > > > - [a] remove relevant MRs from the > > > > > > > > > > > > > > > > > global cache > > > > > > > > > > > > > > > > > - [b] set local caches flush request > > > > > > > > > > > > > > > > > - free global cache lock > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If I understand correctly, your patch > > > > > > > > > > > > > > > > > swaps [a] and [b], and local caches > > > > > > > > > > > > > > > > > flush is requested > > > earlier. > > > > > > > > > > > > > > > > > What problem does it > > > > > > > > > > solve? > > > > > > > > > > > > > > > > > It is not supposed there are in datapath > > > > > > > > > > > > > > > > > some mbufs referencing to the memory > > > > > > > > > > > > > > > > > being > > > freed. > > > > > > > > > > > > > > > > > Application must ensure this and must > > > > > > > > > > > > > > > > > not allocate new mbufs from this memory > > > > > > > > > > > > > > > > > regions > > > > > > > > > > > > being freed. > > > > > > > > > > > > > > > > > Hence, the lookups for these MRs in > > > > > > > > > > > > > > > > > caches should not > > > > > > > > occur. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For your first point that, application can > > > > > > > > > > > > > > > > take charge of preventing MR freed memory > > > > > > > > > > > > > > > > being allocated to > > > > > > data path. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Does it means that If there is an > > > > > > > > > > > > > > > > emergency of MR fragment, such as hotplug, > > > > > > > > > > > > > > > > the application must inform thedata path > > > > > > > > > > > > > > > > in advance, and this memory will not be > > > > > > > > > > > > > > > > allocated, and then the control path will > > > > > > > > > > > > > > > > free this memory? If application can do > > > > > > > > > > > > > > > > like this, I agree that this bug > > > > > > > > > > > > > > cannot happen. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Actually, this is the only correct way for > > > > > > > > > > > > > > > application to > > > > > > operate. > > > > > > > > > > > > > > > Let's suppose we have some memory area that > > > > > > > > > > > > > > > application wants to > > > > > > > > > > > > free. > > > > > > > > > > > > > > > ALL references to this area must be removed. > > > > > > > > > > > > > > > If we have some mbufs allocated from this > > > > > > > > > > > > > > > area, it means that we have memory pool > > > > > > > > > > > > > > > created > > > > > > > > > > > > > there. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What application should do: > > > > > > > > > > > > > > > - notify all its components/agents the > > > > > > > > > > > > > > > memory area is going to be freed > > > > > > > > > > > > > > > - all components/agents free the mbufs they > > > > > > > > > > > > > > > might own > > > > > > > > > > > > > > > - PMD might not support freeing for some > > > > > > > > > > > > > > > mbufs (for example being sent and awaiting > > > > > > > > > > > > > > > for completion), so app should just wait > > > > > > > > > > > > > > > - wait till all mbufs are returned to the > > > > > > > > > > > > > > > memory pool (by monitoring available obj == > > > > > > > > > > > > > > > pool size) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Otherwise - it is dangerous to free the memory. > > > > > > > > > > > > > > > There are just some mbufs still allocated, > > > > > > > > > > > > > > > it is regardless to buf address to MR > > > > > > > > > > > > > > > translation. We just can't free the memory - > > > > > > > > > > > > > > > the mapping will be destroyed and might > > > > > > > > > > > > > > > cause the segmentation fault by SW or some > > > > > > > > > > > > > > > HW issues on DMA access to unmapped memory. > > > > > > > > > > > > > > > It is very generic safety approach - do not > > > > > > > > > > > > > > > free the memory that is still in > > > > > > > > > > use. > > > > > > > > > > > > > > > Hence, at the moment of freeing and > > > > > > > > > > > > > > > unregistering the MR, there MUST BE NO any > > > > > > > > > > > > > > mbufs in flight referencing to the addresses > > > > > > > > > > > > > > being > > freed. > > > > > > > > > > > > > > > No translation to MR being invalidated can happen. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For other side, the cache flush has > > > > > > > > > > > > > > > > > negative effect > > > > > > > > > > > > > > > > > - the local cache is getting empty and > > > > > > > > > > > > > > > > > can't provide translation for other > > > > > > > > > > > > > > > > > valid (not being > > > > > > > > > > > > > > > > > removed) MRs, and the translation has to > > > > > > > > > > > > > > > > > look up in the global cache, that is > > > > > > > > > > > > > > > > > locked now for rebuilding, this causes > > > > > > > > > > > > > > > > > the delays in datapatch > > > > > > > > > > > > > > > > on acquiring global cache lock. > > > > > > > > > > > > > > > > > So, I see some potential performance impact. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If above assumption is true, we can go to > > > > > > > > > > > > > > > > your second > > > > > > point. > > > > > > > > > > > > > > > > I think this is a problem of the tradeoff > > > > > > > > > > > > > > > > between cache coherence and > > > > > > > > > > > > > > > performance. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I can understand your meaning that though > > > > > > > > > > > > > > > > global cache has been changed, we should > > > > > > > > > > > > > > > > keep the valid MR in local cache as long > > > > > > > > > > > > > > > > as possible to ensure the fast > > > > > > searching speed. > > > > > > > > > > > > > > > > In the meanwhile, the local cache can be > > > > > > > > > > > > > > > > rebuilt later to reduce its waiting time > > > > > > > > > > > > > > > > for acquiring the global > > > > > > cache lock. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, this mechanism just ensures the > > > > > > > > > > > > > > > > performance unchanged for the first few mbufs. > > > > > > > > > > > > > > > > During the next mbufs lkey searching after > > 'dev_gen' > > > > > > > > > > > > > > > > updated, it is still necessary to update > > > > > > > > > > > > > > > > the local > > cache. > > > > > > > > > > > > > > > > And the performance can firstly reduce and > > > > > > > > > > > > > > > > then > > > > returns. > > > > > > > > > > > > > > > > Thus, no matter whether there is this > > > > > > > > > > > > > > > > patch or not, the performance will jitter > > > > > > > > > > > > > > > > in a certain period of > > > > > > > > > > > > > time. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Local cache should be updated to remove MRs > > > > > > > > > > > > > > > no longer > > > > > > valid. > > > > > > > > > > > > > > > But we just flush the entire cache. > > > > > > > > > > > > > > > Let's suppose we have valid MR0, MR1, and > > > > > > > > > > > > > > > not valid MRX in local > > > > > > > > > > > cache. > > > > > > > > > > > > > > > And there are traffic in the datapath for > > > > > > > > > > > > > > > MR0 and MR1, and no traffic for MRX anymore. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) If we do as you propose: > > > > > > > > > > > > > > > a) take a lock > > > > > > > > > > > > > > > b) request flush local cache first - all > > > > > > > > > > > > > > > MR0, MR1, MRX will be removed on translation > > > > > > > > > > > > > > > in datapath > > > > > > > > > > > > > > > c) update global cache, > > > > > > > > > > > > > > > d) free lock All the traffic for valid MR0, > > > > > > > > > > > > > > > MR1 ALWAYS will be blocked on lock taken for > > > > > > > > > > > > > > > cache update since point > > > > > > > > > > > > > > > b) till point > > > > > > > d). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) If we do as it is implemented now: > > > > > > > > > > > > > > > a) take a lock > > > > > > > > > > > > > > > b) update global cache > > > > > > > > > > > > > > > c) request flush local cache > > > > > > > > > > > > > > > d) free lock The traffic MIGHT be locked > > > > > > > > > > > > > > > ONLY for MRs non-existing in local cache > > > > > > > > > > > > > > > (not happens for > > > > > > > > > > > > > > > MR0 and MR1, must not happen for MRX), and > > > > > > > > > > > > > > > probability should be minor. And lock might > > > > > > > > > > > > > > > happen since > > > > > > > > > > > > > > > c) till > > > > > > > > > > > > > > > d) > > > > > > > > > > > > > > > - quite short period of time > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Summary, the difference between 1) and 2) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Lock probability: > > > > > > > > > > > > > > > - 1) lock ALWAYS happen for ANY MR > > > > > > > > > > > > > > > translation after > > > b), > > > > > > > > > > > > > > > 2) lock MIGHT happen, for cache miss ONLY, > > > > > > > > > > > > > > > after > > > > > > > > > > > > > > > c) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Lock duration: > > > > > > > > > > > > > > > - 1) lock since b) till d), > > > > > > > > > > > > > > > 2) lock since c) till d), that seems to be > > > > > > > > > > > > > > > much > > shorter. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Finally, in conclusion, I tend to think > > > > > > > > > > > > > > > > that the bottom layer can do more things > > > > > > > > > > > > > > > > to ensure the correct execution of the > > > > > > > > > > > > > > > > program, which may have a negative impact > > > > > > > > > > > > > > > > on the performance in a short time, but in > > > > > > > > > > > > > > > > the long run, the performance will > > > > > > > > > > > > > > > > eventually > > > > > > > > > > > > come back. > > > > > > > > > > > > > > > > Furthermore, maybe we should pay attention > > > > > > > > > > > > > > > > to the performance in the stable period, > > > > > > > > > > > > > > > > and try our best to ensure the correctness > > > > > > > > > > > > > > > > of the program in case of > > > > > > > > > > > > > > > emergencies. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we have some mbufs still allocated in > > > > > > > > > > > > > > > memory being freed > > > > > > > > > > > > > > > - there is nothing to say about correctness, > > > > > > > > > > > > > > > it is totally incorrect. In my opinion, we > > > > > > > > > > > > > > > should not think how to mitigate this > > > > > > > > > > > > > > > incorrect behavior, we should not encourage > > > > > > > > > > > > > > > application developers to follow the wrong > > > > > > > > > > > > > > approaches. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With best regards, Slava > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards Feifei > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With best regards, Slava