> -----Original Message----- > From: Carrillo, Erik G <erik.g.carri...@intel.com> > Sent: Friday, April 10, 2020 3:29 AM > To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Phil Yang > <phil.y...@arm.com>; rsanf...@akamai.com; dev@dpdk.org > Cc: david.march...@redhat.com; Burakov, Anatoly > <anatoly.bura...@intel.com>; tho...@monjalon.net; jer...@marvell.com; > hemant.agra...@nxp.com; Gavin Hu <gavin...@arm.com>; nd > <n...@arm.com>; nd <n...@arm.com> > Subject: RE: [PATCH 2/2] lib/timer: relax barrier for status update > > > -----Original Message----- > > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > Sent: Wednesday, April 8, 2020 4:56 PM > > To: Carrillo, Erik G <erik.g.carri...@intel.com>; Phil Yang > > <phil.y...@arm.com>; rsanf...@akamai.com; dev@dpdk.org > > Cc: david.march...@redhat.com; Burakov, Anatoly > > <anatoly.bura...@intel.com>; tho...@monjalon.net; > jer...@marvell.com; > > hemant.agra...@nxp.com; Gavin Hu <gavin...@arm.com>; nd > > <n...@arm.com>; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; > > nd <n...@arm.com> > > Subject: RE: [PATCH 2/2] lib/timer: relax barrier for status update > > > > <snip> > > > > > > > > > > > > Subject: [PATCH 2/2] lib/timer: relax barrier for status update > > > > > > > > > > > > Volatile has no ordering semantics. The rte_timer structure > > > > > > defines timer status as a volatile variable and uses the > > > > > > rte_r/wmb barrier to guarantee inter-thread visibility. > > > > > > > > > > > > This patch optimized the volatile operation with c11 atomic > > > > > > operations and one-way barrier to save the performance penalty. > > > > > > According to the timer_perf_autotest benchmarking results, this > > > > > > patch can uplift 10%~16% timer appending performance, 3%~20% > > > > > > timer resetting performance and 45% timer callbacks scheduling > > > > > > performance on aarch64 and no loss in performance for x86. > > > > > > > > > > > > Suggested-by: Honnappa Nagarahalli > > > > > > <honnappa.nagaraha...@arm.com> > > > > > > Signed-off-by: Phil Yang <phil.y...@arm.com> > > > > > > Reviewed-by: Gavin Hu <gavin...@arm.com> > > > > > > > > > > Hi Phil, > > > > > > > > > > It seems like the consensus is to generally avoid replacing > rte_atomic_* > > > > > interfaces with the GCC builtins directly. In other areas of DPDK > > > > > that > > are > > > > > being patched, are the <std_atomic.h> C11 APIs going to be > > investigated? > > > > It > > > > > seems like that decision will apply here as well. > > > > Agree. The new APIs are going to be 1 to 1 mapped with the built-in > > > > intrinsics (the memory orderings used themselves will not change). > > > > We should go ahead with the review and conclude any issues. Once the > > > > decision is made on what APIs to use, we can submit the next version > > > > using > > > the APIs decided. > > > > > > > Thanks, Honnappa. > > > > > > I have reviewed the memory orderings and I see no issues with them. I > do > > > have a question regarding a comment - I'll pose it inline: > > Fantastic, thank you. > > I have an unrelated (to this patch) question for you below. > > > > > > > > > > > > > > > Thanks, > > > > > Erik > > > > > > > > > > > --- > > > > > > lib/librte_timer/rte_timer.c | 90 > > > > > > +++++++++++++++++++++++++++++++---- > > > > > > --------- > > > > > > lib/librte_timer/rte_timer.h | 2 +- > > > > > > 2 files changed, 65 insertions(+), 27 deletions(-) > > > > > > > > > > > > diff --git a/lib/librte_timer/rte_timer.c > > > > > > b/lib/librte_timer/rte_timer.c index 269e921..be0262d 100644 > > > > > > --- a/lib/librte_timer/rte_timer.c > > > > > > +++ b/lib/librte_timer/rte_timer.c > > > > > > @@ -10,7 +10,6 @@ > > > > > > #include <assert.h> > > > > > > #include <sys/queue.h> > > > > > > > > > > > > -#include <rte_atomic.h> > > > > > > #include <rte_common.h> > > > > > > #include <rte_cycles.h> > > > > > > #include <rte_eal_memconfig.h> > > > > > > @@ -218,7 +217,7 @@ rte_timer_init(struct rte_timer *tim) > > > > > > > > > > > > status.state = RTE_TIMER_STOP; > > > > > > status.owner = RTE_TIMER_NO_OWNER; > > > > > > - tim->status.u32 = status.u32; > > > > > > + __atomic_store_n(&tim->status.u32, status.u32, > > > > > > __ATOMIC_RELAXED); > > > > > > } > > > > > > > > > > > > /* > > <... snipped ...> > > > > > > > @@ -258,9 +257,20 @@ timer_set_config_state(struct rte_timer > > *tim, > > > > > > * mark it atomically as being configured */ > > > > > > status.state = RTE_TIMER_CONFIG; > > > > > > status.owner = (int16_t)lcore_id; > > > > > > - success = rte_atomic32_cmpset(&tim->status.u32, > > > > > > - prev_status.u32, > > > > > > - status.u32); > > > > > > + /* If status is observed as RTE_TIMER_CONFIG > > earlier, > > > > > > + * that's not going to cause any issues because the > > > > > > + * pattern is read for status then read the other > > members. > > > > > > I don't follow the above comment. What is meant by "earlier"? > > > > > > Thanks, > > > Erik > > I would rather change this comment to something similar to what is > > mentioned while changing to 'RUNNING' state. > > 'CONFIG' is also a locking state. I think it is much easier to understand. > > > > Ok, thanks - that makes sense.
OK, thanks. I will modify the comments in V2 to: "CONFIG states are acting as locked states. If the timer is in CONFIG state, the state cannot be changed by other threads. So, we should use ACQUIRE here." Thanks, Phil > > < ... snipped ...> > > > > > > > 748,8 +774,12 @@ __rte_timer_manage(struct rte_timer_data > > > > *timer_data) > > > > > > status.state = RTE_TIMER_PENDING; > > Is it better to set this to STOPPED since it is out of the run list? I > > think it is > > better for the understanding as well. > > > > In this location, we are dealing with periodic timers, and we are about to > restart the current timer after it just expired and its callback was executed. > As I understand it, setting the state back to PENDING here will cause the > timer_reset() call below to remove this timer from the list (run list) it's > still in > (and fix up the links from the previous to the next elements), update other > bits of the data structure, and update stats. That behavior would change if > we set the state to STOPPED. At least to me, it also seems like the PENDING > state is still accurate conceptually since the periodic timer wasn't > explicitly > stopped by this processing. Yes. +1 for this. > > Thanks, > Erik > > > > > > > __TIMER_STAT_ADD(priv_timer, pending, 1); > > > > > > status.owner = (int16_t)lcore_id; > > > > > > - rte_wmb(); > > > > > > - tim->status.u32 = status.u32; > > > > > > + /* The "RELEASE" ordering guarantees the > > memory > > > > > > + * operations above the status update are > > observed > > > > > > + * before the update by all threads > > > > > > + */ > > > > > > + __atomic_store_n(&tim->status.u32, > > status.u32, > > > > > > + __ATOMIC_RELEASE); > > > > > > __rte_timer_reset(tim, tim->expire + tim- > > >period, > > > > > > tim->period, lcore_id, tim->f, tim- > > >arg, 1, > > > > > > timer_data); > > > > > > @@ -919,8 +949,12 @@ rte_timer_alt_manage(uint32_t > > timer_data_id, > > > > > > /* remove from done list and mark timer as > > stopped > > > > > */ > > > > > > status.state = RTE_TIMER_STOP; > > > > > > status.owner = RTE_TIMER_NO_OWNER; > > > > > > - rte_wmb(); > > > > > > - tim->status.u32 = status.u32; > > > > > > + /* The "RELEASE" ordering guarantees the > > memory > > > > > > + * operations above the status update are > > observed > > > > > > + * before the update by all threads > > > > > > + */ > > > > > > + __atomic_store_n(&tim->status.u32, > > status.u32, > > > > > > + __ATOMIC_RELEASE); > > > > > > } else { > > > > > > /* keep it in list and mark timer as pending */ > > > > > > rte_spinlock_lock( > > > > > > @@ -928,8 +962,12 @@ rte_timer_alt_manage(uint32_t > > timer_data_id, > > > > > > status.state = RTE_TIMER_PENDING; > > > > > > __TIMER_STAT_ADD(data->priv_timer, > > pending, 1); > > > > > > status.owner = (int16_t)this_lcore; > > > > > > - rte_wmb(); > > > > > > - tim->status.u32 = status.u32; > > > > > > + /* The "RELEASE" ordering guarantees the > > memory > > > > > > + * operations above the status update are > > observed > > > > > > + * before the update by all threads > > > > > > + */ > > > > > > + __atomic_store_n(&tim->status.u32, > > status.u32, > > > > > > + __ATOMIC_RELEASE); > > > > > > __rte_timer_reset(tim, tim->expire + tim- > > >period, > > > > > > tim->period, this_lcore, tim->f, tim- > > >arg, 1, > > > > > > data); > > > > > > diff --git a/lib/librte_timer/rte_timer.h > > > > > > b/lib/librte_timer/rte_timer.h index c6b3d45..df533fa 100644 > > > > > > --- a/lib/librte_timer/rte_timer.h > > > > > > +++ b/lib/librte_timer/rte_timer.h > > > > > > @@ -101,7 +101,7 @@ struct rte_timer { > > > > > > uint64_t expire; /**< Time when timer expire. */ > > > > > > struct rte_timer *sl_next[MAX_SKIPLIST_DEPTH]; > > > > > > - volatile union rte_timer_status status; /**< Status of timer. > > */ > > > > > > + union rte_timer_status status; /**< Status of timer. */ > > > > > > uint64_t period; /**< Period of timer (0 if not > > > > > > periodic). */ > > > > > > rte_timer_cb_t f; /**< Callback function. */ > > > > > > void *arg; /**< Argument to callback function. */ > > > > > > -- > > > > > > 2.7.4