Hi Feifei, > > Hi, Honnappa, Konstantin and Stephen > > Thanks very much for your attention of this patch. Based on your opinion, > Ruifeng and I discuss about this and make a summary: > _________________________________________________________________________________________________________________ > ___ > main thread > worker thread > rte_eal_remote_launch: > [ Honnappa focus ] > > To ensure f can load correct arg, > > arg store should before f > lcore_config[worker_id].f = f; > lcore_config[worker_id].arg = > arg; > wmb()? or store-relase on f? > > > eal_thread_loop: > pipeline_communication > ----------------------> pipeline_communication > > if (lcore_config[lcore_id].f == > NULL) > > rte_panic("NULL function > pointer\n"); > > > fct_arg = > lcore_config[lcore_id].arg; > > ret = > lcore_config[lcore_id].f(fct_arg); > _________________________________________________________________________________________________________________ > __ > > test_ring_stress: wmb()? > [ Konstantin focus ] > > test_worker: > Main thread can use wrk_cmd to > Wrk_cmd =WRK_CMD_RUN; > ----------------------> Wrk_cmd == WRK_CMD_RUN; > control multiple threads to start running > > wmb()? > at the same time as much as possible > > ring_dequeue; > > ring_enqueue; > Wrk_cmd =WRK_CMD_STOP; > ----------------------> Wrk_cmd == WRK_CMD_STOP; > _________________________________________________________________________________________________________________ > ___ > > rte_eal_wait_lcore: > wmb() > [ Honnappa focus ] > lcore_config[lcore_id].state == FINISHED > <--------------------- lcore_config[lcore_id].state = > FINISHED Load-acquire and store-release > > are used on the variable “state” > rmb(); > _________________________________________________________________________________________________________________ > ___ > > From the picture above, > > 1.First, for the underlying function rte_eal_remote_launch, Honnappa focuses > on that, > pipeline_communication cannot ensure ‘arg’ parameters is loaded correctly by > the worker thread. > This is because in weak memory order framework, maybe the main thread and > worker > thread firstly finish pipeline communication, and then the worker thread > receive signal > and execute the function ‘ f ’. However, it maybe load a wrong value of ‘arg’ > due to that > the main thread stores ‘arg’ after pipeline communication. So wmb or > store_release is > necessary for ‘arg’. > > 2.Second, for the upper-layer test_ring_stress, Konstantin foucese on that, > Whether the main thread can use ‘wrk_cmd’ to control multiple threads to run > at the > same time as much as possible. > Because rte_eal_remote_launch only can communicates with one worker thread > at the same time. This means some worker thread maybe start working very > early but other > worker threads maybe need to wait a long time to start working if ‘wrk_cmd' > is stored 'RUN' flag > before rte_remote_launch. > At last, for unit test, this may cause that the test results are not stable. > > 3.Third, for rte_eal_wait_lcore, Honnappa focuses on that the ‘state’ as a > synchronous bariable, > we should add load-acquire and store-release on it. However, there have been > rmb and wmb > after and before ‘state’, So I’m not sure whether we should replace them. > > In summary, I think Honnappa and Konstantin have different concerns. > For Honnappa, we can add wmb or store-release to ensure the ‘arg’ can be > loaded correctly > in rte_eal_remote_launch. > For Konstantin, we can add wmb and rmb to ensure the main thread can control > the worker > Threads to run at the same time, and then make the test results more accurate > in the > ring_stress_test.
Agree with both. > > > Best Regards > Feifei > > > -----邮件原件----- > > 发件人: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > 发送时间: 2021年1月30日 9:24 > > 收件人: Stephen Hemminger <step...@networkplumber.org> > > 抄送: Ananyev, Konstantin <konstantin.anan...@intel.com>; Feifei Wang > > <feifei.wa...@arm.com>; dev@dpdk.org; nd <n...@arm.com>; Ruifeng > > Wang <ruifeng.w...@arm.com>; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com>; nd <n...@arm.com> > > 主题: RE: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary barriers > > for ring stress test > > > > <snip> > > > > > > > > > > > > > > > > > > > > > > > > > Hi Feifei, > > > > > > > > > > > > > > > > > > > > > > > The variable "wrk_cmd" is a signal to control threads from > > > > > > > > running and stopping. When worker lcores load "wrk_cmd == > > > > > WRK_CMD_RUN", > > > > > > > > they > > > > > > > start > > > > > > > > running and when worker lcores load "wrk_cmd == > > > > > > > > WRK_CMD_STOP", > > > > > > > they > > > > > > > > stop. > > > > > > > > > > > > > > > > For the wmb in test_mt1, no storing operations must keep the > > > > > > > > order after storing "wrk_cmd". Thus the wmb is unnecessary. > > > > > > > > > > > > > > I think there is a bug in my original code, we should do > > > > > > > smp_wmb() > > > > > > > *before* setting wrk_cmd, not after: > > > > > > > > > > > > > > /* launch on all workers */ > > > > > > > RTE_LCORE_FOREACH_WORKER(lc) { > > > > > > > arg[lc].rng = r; > > > > > > > arg[lc].stats = init_stat; > > > > > > > rte_eal_remote_launch(test, &arg[lc], lc); > > > > > > > } > > > > > > > > > > > > > > /* signal worker to start test */ > > > > > > > + rte_smp_wmb(); > > > > > > > wrk_cmd = WRK_CMD_RUN; > > > > > > > - rte_smp_wmb(); > > > > > > > > > > > > > > usleep(run_time * US_PER_S); > > > > > > > > > > > > > > > > > > > > > I still think we'd better have some synchronisation here. > > > > > > > Otherwise what would prevent compiler and/or cpu to update > > > > > > > wrk_cmd out of order (before _init_ phase is completed)? > > > > > > > We probably can safely assume no reordering from the compiler > > > > > > > here, as we have function calls straight before and after > > > > > > > 'wrk_cmd = > > > > > WRK_CMD_RUN;' > > > > > > > But for consistency and easier maintenance, I still think it > > > > > > > is better to have something here, after all it is not > > > > > > > performance critical > > > pass. > > > > > > Agree that this is not performance critical. > > > > > > > > > > > > This is more about correctness (as usually people refer to code > > > > > > to understand the concepts). You can refer to video [1]. > > > > > > Essentially, the pthread_create has 'happens-before' behavior. > > > > > > i.e. all the memory operations before the pthread_create are > > > > > > visible to the new > > > thread. > > > > > > The > > > > > > rte_smp_rmb() barrier in the thread function is not required as > > > > > > it reads the > > > > > data that was set before the thread was launched. > > > > > > > > > > rte_eal_remote_launch() doesn't call pthread_create(). > > > > > All it does - updates global variable (lcore_config) and > > > > > writes/reads to/from the pipe. > > > > > > > > > Thanks for the reminder ☹ > > > > I think rte_eal_remote_launch and rte_eal_wait_lcore need to provide > > > behavior similar to pthread_launch and pthread_join respectively. > > > > > > > > There is use of rte_smp_*mb in those functions as well. Those need > > > > to be fixed > > > first and then look at these. > > > > > > Looks like you want __atomic_thread_fence() here. > > > > > In the rte_eal_remote_launch case, all the memory operations before the > > API call need to be visible to the worker. If this is the only requirement, > > we > > can use the function pointer as the guard variable and use store-release. In > > the eal_thread_loop function we could do load-acquire on the function > > pointer. > > > > I do not think that there is a requirement to ensure that the memory > > operations after the API call do not happen before the worker thread starts > > running the function (As there is no guarantee on when the worker thread > > will run. If the main thread needs to know if the worker thread is running > > explicit hand-shaking needs to happen). > > > > The rte_eal_wait_lcore API needs to ensure that the memory operations in > > the worker are visible to the main. rte_eal_wait_lcore and eal_thread_loop > > are synchronizing using lcore_config[worker_id].state. I need to understand > > what else 'state' is used for. If there are no issues, we can do a > > store-release > > on 'state' in eal_thread_loop and a load-acquire in rte_eal_wait_lcore. > > > > So, we do not have to use the __atomic_thread_fence. > >