Hi, everyone Sorry for that there may be a problem in the e-mail format. Please see the picture according to the following link: https://ibb.co/SQ7yGfW
Best Regards Feifei > -----邮件原件----- > 发件人: Feifei Wang > 发送时间: 2021年2月1日 16:49 > 收件人: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Stephen > Hemminger <step...@networkplumber.org> > 抄送: Ananyev, Konstantin <konstantin.anan...@intel.com>; > dev@dpdk.org; nd <n...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com>; nd <n...@arm.com> > 主题: 回复: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary > barriers for ring stress test > > Sorry, a mistake happens in the picture, after Wrk_cmd == WRK_CMD_RUN, > it should be a rmb rather than wmb. > > > -----邮件原件----- > > 发件人: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > 发送时间: 2021年1月30日 9:24 > > 收件人: Stephen Hemminger <step...@networkplumber.org> > > 抄送: Ananyev, Konstantin <konstantin.anan...@intel.com>; Feifei Wang > > <feifei.wa...@arm.com>; dev@dpdk.org; nd <n...@arm.com>; Ruifeng > Wang > > <ruifeng.w...@arm.com>; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com>; nd <n...@arm.com> > > 主题: RE: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary > > barriers for ring stress test > > > > <snip> > > > > > > > > > > > > > > > > > > > > > > > > > Hi Feifei, > > > > > > > > > > > > > > > > > > > > > > > The variable "wrk_cmd" is a signal to control threads from > > > > > > > > running and stopping. When worker lcores load "wrk_cmd == > > > > > WRK_CMD_RUN", > > > > > > > > they > > > > > > > start > > > > > > > > running and when worker lcores load "wrk_cmd == > > > > > > > > WRK_CMD_STOP", > > > > > > > they > > > > > > > > stop. > > > > > > > > > > > > > > > > For the wmb in test_mt1, no storing operations must keep > > > > > > > > the order after storing "wrk_cmd". Thus the wmb is > unnecessary. > > > > > > > > > > > > > > I think there is a bug in my original code, we should do > > > > > > > smp_wmb() > > > > > > > *before* setting wrk_cmd, not after: > > > > > > > > > > > > > > /* launch on all workers */ > > > > > > > RTE_LCORE_FOREACH_WORKER(lc) { > > > > > > > arg[lc].rng = r; > > > > > > > arg[lc].stats = init_stat; > > > > > > > rte_eal_remote_launch(test, &arg[lc], lc); > > > > > > > } > > > > > > > > > > > > > > /* signal worker to start test */ > > > > > > > + rte_smp_wmb(); > > > > > > > wrk_cmd = WRK_CMD_RUN; > > > > > > > - rte_smp_wmb(); > > > > > > > > > > > > > > usleep(run_time * US_PER_S); > > > > > > > > > > > > > > > > > > > > > I still think we'd better have some synchronisation here. > > > > > > > Otherwise what would prevent compiler and/or cpu to update > > > > > > > wrk_cmd out of order (before _init_ phase is completed)? > > > > > > > We probably can safely assume no reordering from the > > > > > > > compiler here, as we have function calls straight before and > > > > > > > after 'wrk_cmd = > > > > > WRK_CMD_RUN;' > > > > > > > But for consistency and easier maintenance, I still think it > > > > > > > is better to have something here, after all it is not > > > > > > > performance critical > > > pass. > > > > > > Agree that this is not performance critical. > > > > > > > > > > > > This is more about correctness (as usually people refer to > > > > > > code to understand the concepts). You can refer to video [1]. > > > > > > Essentially, the pthread_create has 'happens-before' behavior. > > > > > > i.e. all the memory operations before the pthread_create are > > > > > > visible to the new > > > thread. > > > > > > The > > > > > > rte_smp_rmb() barrier in the thread function is not required > > > > > > as it reads the > > > > > data that was set before the thread was launched. > > > > > > > > > > rte_eal_remote_launch() doesn't call pthread_create(). > > > > > All it does - updates global variable (lcore_config) and > > > > > writes/reads to/from the pipe. > > > > > > > > > Thanks for the reminder ☹ > > > > I think rte_eal_remote_launch and rte_eal_wait_lcore need to > > > > provide > > > behavior similar to pthread_launch and pthread_join respectively. > > > > > > > > There is use of rte_smp_*mb in those functions as well. Those need > > > > to be fixed > > > first and then look at these. > > > > > > Looks like you want __atomic_thread_fence() here. > > > > > In the rte_eal_remote_launch case, all the memory operations before > > the API call need to be visible to the worker. If this is the only > > requirement, we can use the function pointer as the guard variable and > > use store-release. In the eal_thread_loop function we could do > > load-acquire on the function pointer. > > > > I do not think that there is a requirement to ensure that the memory > > operations after the API call do not happen before the worker thread > > starts running the function (As there is no guarantee on when the > > worker thread will run. If the main thread needs to know if the worker > > thread is running explicit hand-shaking needs to happen). > > > > The rte_eal_wait_lcore API needs to ensure that the memory operations > > in the worker are visible to the main. rte_eal_wait_lcore and > > eal_thread_loop are synchronizing using lcore_config[worker_id].state. > > I need to understand what else 'state' is used for. If there are no > > issues, we can do a store-release on 'state' in eal_thread_loop and a load- > acquire in rte_eal_wait_lcore. > > > > So, we do not have to use the __atomic_thread_fence. > >