<snip> > > Hi Feifei, > > > > > Hi, Honnappa, Konstantin and Stephen > > > > Thanks very much for your attention of this patch. Based on your > > opinion, Ruifeng and I discuss about this and make a summary: > > > __________________________________________________________ > ____________ > > ___________________________________________ > > ___ > > main thread > worker thread > > rte_eal_remote_launch: > > [ Honnappa focus ] > > > > To ensure f can load correct arg, > > > > arg store should before f > > lcore_config[worker_id].f = f; > > lcore_config[worker_id].arg = > arg; > > wmb()? or store-relase on f? > > > > > eal_thread_loop: > > pipeline_communication > ----------------------> pipeline_communication > > > if (lcore_config[lcore_id].f == > > NULL) > > > rte_panic("NULL function > > pointer\n"); > > > > > fct_arg = > > lcore_config[lcore_id].arg; > > > ret = > > lcore_config[lcore_id].f(fct_arg); > > > __________________________________________________________ > ____________ > > ___________________________________________ > > __ > > > > test_ring_stress: wmb()? > > [ Konstantin focus ] > > > test_worker: > > Main thread can use wrk_cmd to > > Wrk_cmd =WRK_CMD_RUN; > ----------------------> Wrk_cmd == WRK_CMD_RUN; > > control multiple threads to start running > > > wmb()? > > at the same time as much as possible > > > ring_dequeue; > > > ring_enqueue; > > Wrk_cmd =WRK_CMD_STOP; > ----------------------> Wrk_cmd == WRK_CMD_STOP; > > > __________________________________________________________ > ____________ > > ___________________________________________ > > ___ > > > > rte_eal_wait_lcore: > wmb() > > [ Honnappa focus ] > > lcore_config[lcore_id].state == FINISHED > <--------------------- lcore_config[lcore_id].state = > > FINISHED Load-acquire and store-release > > > > are used on the variable “state” > > rmb(); > > > __________________________________________________________ > ____________ > > ___________________________________________ > > ___ > > > > From the picture above, > > > > 1.First, for the underlying function rte_eal_remote_launch, Honnappa > > focuses on that, pipeline_communication cannot ensure ‘arg’ parameters > > is loaded correctly by the worker thread. > > This is because in weak memory order framework, maybe the main thread > > and worker thread firstly finish pipeline communication, and then the > > worker thread receive signal and execute the function ‘ f ’. However, > > it maybe load a wrong value of ‘arg’ due to that the main thread > > stores ‘arg’ after pipeline communication. So wmb or store_release is > necessary for ‘arg’. > > > > 2.Second, for the upper-layer test_ring_stress, Konstantin foucese on > > that, Whether the main thread can use ‘wrk_cmd’ to control multiple > > threads to run at the same time as much as possible. > > Because rte_eal_remote_launch only can communicates with one worker > > thread at the same time. This means some worker thread maybe start > > working very early but other worker threads maybe need to wait a long > > time to start working if ‘wrk_cmd' is stored 'RUN' flag before > rte_remote_launch. > > At last, for unit test, this may cause that the test results are not stable. > > > > 3.Third, for rte_eal_wait_lcore, Honnappa focuses on that the ‘state’ as a > synchronous bariable, > > we should add load-acquire and store-release on it. However, there > > have been rmb and wmb after and before ‘state’, So I’m not sure whether > we should replace them. > > > > In summary, I think Honnappa and Konstantin have different concerns. > > For Honnappa, we can add wmb or store-release to ensure the ‘arg’ can > > be loaded correctly in rte_eal_remote_launch. > > For Konstantin, we can add wmb and rmb to ensure the main thread can > > control the worker Threads to run at the same time, and then make the > > test results more accurate in the ring_stress_test. > > Agree with both. Thanks Feifei, understood. I am just trying to take a step back and see what kind of ordering guarantees rte_eal_remote_launch should provide so that we do not have to deal with adding additional barriers in the applications. For ex: if we can avoid the barriers around 'wrk_cmd' (kind of use cases) it will benefit all the applications.
> > > > > > > Best Regards > > Feifei > > > > > -----邮件原件----- > > > 发件人: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > > 发送时间: 2021年1月30日 9:24 > > > 收件人: Stephen Hemminger <step...@networkplumber.org> > > > 抄送: Ananyev, Konstantin <konstantin.anan...@intel.com>; Feifei > Wang > > > <feifei.wa...@arm.com>; dev@dpdk.org; nd <n...@arm.com>; Ruifeng > Wang > > > <ruifeng.w...@arm.com>; Honnappa Nagarahalli > > > <honnappa.nagaraha...@arm.com>; nd <n...@arm.com> > > > 主题: RE: [dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary > > > barriers for ring stress test > > > > > > <snip> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Feifei, > > > > > > > > > > > > > > > > > > > > > > > > > > The variable "wrk_cmd" is a signal to control threads > > > > > > > > > from running and stopping. When worker lcores load > > > > > > > > > "wrk_cmd == > > > > > > WRK_CMD_RUN", > > > > > > > > > they > > > > > > > > start > > > > > > > > > running and when worker lcores load "wrk_cmd == > > > > > > > > > WRK_CMD_STOP", > > > > > > > > they > > > > > > > > > stop. > > > > > > > > > > > > > > > > > > For the wmb in test_mt1, no storing operations must keep > > > > > > > > > the order after storing "wrk_cmd". Thus the wmb is > unnecessary. > > > > > > > > > > > > > > > > I think there is a bug in my original code, we should do > > > > > > > > smp_wmb() > > > > > > > > *before* setting wrk_cmd, not after: > > > > > > > > > > > > > > > > /* launch on all workers */ > > > > > > > > RTE_LCORE_FOREACH_WORKER(lc) { > > > > > > > > arg[lc].rng = r; > > > > > > > > arg[lc].stats = init_stat; > > > > > > > > rte_eal_remote_launch(test, &arg[lc], lc); > > > > > > > > } > > > > > > > > > > > > > > > > /* signal worker to start test */ > > > > > > > > + rte_smp_wmb(); > > > > > > > > wrk_cmd = WRK_CMD_RUN; > > > > > > > > - rte_smp_wmb(); > > > > > > > > > > > > > > > > usleep(run_time * US_PER_S); > > > > > > > > > > > > > > > > > > > > > > > > I still think we'd better have some synchronisation here. > > > > > > > > Otherwise what would prevent compiler and/or cpu to update > > > > > > > > wrk_cmd out of order (before _init_ phase is completed)? > > > > > > > > We probably can safely assume no reordering from the > > > > > > > > compiler here, as we have function calls straight before > > > > > > > > and after 'wrk_cmd = > > > > > > WRK_CMD_RUN;' > > > > > > > > But for consistency and easier maintenance, I still think > > > > > > > > it is better to have something here, after all it is not > > > > > > > > performance critical > > > > pass. > > > > > > > Agree that this is not performance critical. > > > > > > > > > > > > > > This is more about correctness (as usually people refer to > > > > > > > code to understand the concepts). You can refer to video [1]. > > > > > > > Essentially, the pthread_create has 'happens-before' behavior. > > > > > > > i.e. all the memory operations before the pthread_create are > > > > > > > visible to the new > > > > thread. > > > > > > > The > > > > > > > rte_smp_rmb() barrier in the thread function is not required > > > > > > > as it reads the > > > > > > data that was set before the thread was launched. > > > > > > > > > > > > rte_eal_remote_launch() doesn't call pthread_create(). > > > > > > All it does - updates global variable (lcore_config) and > > > > > > writes/reads to/from the pipe. > > > > > > > > > > > Thanks for the reminder ☹ > > > > > I think rte_eal_remote_launch and rte_eal_wait_lcore need to > > > > > provide > > > > behavior similar to pthread_launch and pthread_join respectively. > > > > > > > > > > There is use of rte_smp_*mb in those functions as well. Those > > > > > need to be fixed > > > > first and then look at these. > > > > > > > > Looks like you want __atomic_thread_fence() here. > > > > > > > In the rte_eal_remote_launch case, all the memory operations before > > > the API call need to be visible to the worker. If this is the only > > > requirement, we can use the function pointer as the guard variable > > > and use store-release. In the eal_thread_loop function we could do > > > load-acquire on the function pointer. > > > > > > I do not think that there is a requirement to ensure that the memory > > > operations after the API call do not happen before the worker thread > > > starts running the function (As there is no guarantee on when the > > > worker thread will run. If the main thread needs to know if the > > > worker thread is running explicit hand-shaking needs to happen). > > > > > > The rte_eal_wait_lcore API needs to ensure that the memory > > > operations in the worker are visible to the main. rte_eal_wait_lcore > > > and eal_thread_loop are synchronizing using > > > lcore_config[worker_id].state. I need to understand what else > > > 'state' is used for. If there are no issues, we can do a store-release on > 'state' in eal_thread_loop and a load-acquire in rte_eal_wait_lcore. > > > > > > So, we do not have to use the __atomic_thread_fence. > > >