<snip>

> >
> > >
> > > > >
> > > > > Hi Feifei,
> > > > >
> > > > > >
> > > > > > The variable "wrk_cmd" is a signal to control threads from
> > > > > > running and stopping. When worker lcores load "wrk_cmd ==
> > > WRK_CMD_RUN",
> > > > > > they
> > > > > start
> > > > > > running and when worker lcores load "wrk_cmd == WRK_CMD_STOP",
> > > > > they
> > > > > > stop.
> > > > > >
> > > > > > For the wmb in test_mt1, no storing operations must keep the
> > > > > > order after storing "wrk_cmd". Thus the wmb is unnecessary.
> > > > >
> > > > > I think there is a bug in my original code, we should do
> > > > > smp_wmb()
> > > > > *before* setting wrk_cmd, not after:
> > > > >
> > > > >         /* launch on all workers */
> > > > >         RTE_LCORE_FOREACH_WORKER(lc) {
> > > > >                 arg[lc].rng = r;
> > > > >                 arg[lc].stats = init_stat;
> > > > >                 rte_eal_remote_launch(test, &arg[lc], lc);
> > > > >         }
> > > > >
> > > > >         /* signal worker to start test */
> > > > > +      rte_smp_wmb();
> > > > >         wrk_cmd = WRK_CMD_RUN;
> > > > > -       rte_smp_wmb();
> > > > >
> > > > >         usleep(run_time * US_PER_S);
> > > > >
> > > > >
> > > > > I still think we'd better have some synchronisation here.
> > > > > Otherwise what would prevent compiler and/or cpu to update
> > > > > wrk_cmd out of order (before _init_ phase is completed)?
> > > > > We probably can safely assume no reordering from the compiler
> > > > > here, as we have function calls straight before and after
> > > > > 'wrk_cmd =
> > > WRK_CMD_RUN;'
> > > > > But for consistency and easier maintenance, I still think it is
> > > > > better to have something here, after all it is not performance 
> > > > > critical
> pass.
> > > > Agree that this is not performance critical.
> > > >
> > > > This is more about correctness (as usually people refer to code to
> > > > understand the concepts). You can refer to video [1]. Essentially,
> > > > the pthread_create has 'happens-before' behavior. i.e. all the
> > > > memory operations before the pthread_create are visible to the new
> thread.
> > > > The
> > > > rte_smp_rmb() barrier in the thread function is not required as it
> > > > reads the
> > > data that was set before the thread was launched.
> > >
> > > rte_eal_remote_launch() doesn't call pthread_create().
> > > All it does -  updates global variable (lcore_config) and
> > > writes/reads to/from the pipe.
> > >
> > Thanks for the reminder ☹
> > I think rte_eal_remote_launch and rte_eal_wait_lcore need to provide
> behavior similar to pthread_launch and pthread_join respectively.
> >
> > There is use of rte_smp_*mb in those functions as well. Those need to be 
> > fixed
> first and then look at these.
> 
> Looks like you want __atomic_thread_fence() here.
> 
In the rte_eal_remote_launch case, all the memory operations before the API 
call need to be visible to the worker. If this is the only requirement, we can 
use the function pointer as the guard variable and use store-release. In the 
eal_thread_loop function we could do load-acquire on the function pointer.

I do not think that there is a requirement to ensure that the memory operations 
after the API call do not happen before the worker thread starts running the 
function (As there is no guarantee on when the worker thread will run. If the 
main thread needs to know if the worker thread is running explicit hand-shaking 
needs to happen).

The rte_eal_wait_lcore API needs to ensure that the memory operations in the 
worker are visible to the main. rte_eal_wait_lcore and eal_thread_loop are 
synchronizing using lcore_config[worker_id].state. I need to understand what 
else 'state' is used for. If there are no issues, we can do a store-release on 
'state' in eal_thread_loop and a load-acquire in rte_eal_wait_lcore.

So, we do not have to use the __atomic_thread_fence.

Reply via email to