On Fri, Dec 07, 2012 at 12:22:52PM -0500, Mathieu Desnoyers wrote: > * Lai Jiangshan (eag0...@gmail.com) wrote: > > On Saturday, December 8, 2012, Mathieu Desnoyers wrote: > > > > > * Lai Jiangshan (eag0...@gmail.com <javascript:;>) wrote: > > > > we can define rcu_gp_ctr and registry with aligned attribute, but it is > > > not > > > > reliable way > > > > > > > > We need only this: > > > > unsigned long rcu_gp_ctr __attribute((aligned and padded(don't put other > > > > var next to it except the futex))) > > > > > > In which situation would this be unreliable ? > > > > > > > > int a; > > int b __attribute__((aligned)); > > int c; > > > > b and c will be in the same line, even we define c as aligned too, the > > compiler/linker may put a next to b, thus a and b in the same line > > So if our goal is to have rcu_gp_ctr and rcu_gp_futex on the same cache > line, which is different from that of the registry, we could do: > > typeA rcu_gp_ctr __attribute__((aligned(...))); > typeB rcu_gp_futex; > typeC registry __attribute__((aligned(...))); > > I would expect the compiler won't typically reorder rcu_gp_futex and > registry. But I guess there is no guarantee it is going to be always > true given by the C99 standard. > > I guess this is a case where we could bump the library version number > and do things properly. > > Let's think a bit more about it, anyone else has comments on this ?
If you really want the alignment and padding, they should go into a structure. ;-) Thanx, Paul > Thanks, > > Mathieu > > > > > > > > > > > Thanks, > > > > > > Mathieu > > > > > > > > > > > On Saturday, December 8, 2012, Mathieu Desnoyers wrote: > > > > > > > > > * Lai Jiangshan (la...@cn.fujitsu.com <javascript:;> <javascript:;>) > > > wrote: > > > > > > @rcu_gp_ctr and @registry share the same cache line, it causes > > > > > > false sharing and slowdown both of the read site and update site. > > > > > > > > > > > > Fix: Use different cache line for them. > > > > > > > > > > > > Although rcu_gp_futex is updated less than rcu_gp_ctr, but > > > > > > they always be accessed at almost the same time, so we also move > > > > > rcu_gp_futex > > > > > > to the cacheline of rcu_gp_ctr to reduce the cacheline-usage or > > > > > cache-missing > > > > > > of read site. > > > > > > > > > > Hi Lai, > > > > > > > > > > I agree on the goal: placing registry and rcu_gp_ctr on different > > > > > cache-lines. And yes, it makes sense to put rcu_gp_ctr and > > > > > rcu_gp_futex > > > > > on the same cache-line. I agree that the next patch is fine too > > > (keeping > > > > > qsbr and other urcu similar). This is indeed what I try to ensure > > > > > myself. > > > > > > > > > > I'm just concerned that this patch seems to break ABI compability for > > > > > liburcu: the read-side, within applications, would have to be > > > > > recompiled. So I guess we should decide if we do this change in a way > > > > > that does not break the ABI (e.g. not introducing a structure), or if > > > we > > > > > choose to bump the library version number. > > > > > > > > > > Thoughts ? > > > > > > > > > > Thanks, > > > > > > > > > > Mathieu > > > > > > > > > > > > > > > > > > > > > > > test: (4X6=24 CPUs) > > > > > > > > > > > > Before patch: > > > > > > > > > > > > [root@localhost userspace-rcu]# ./tests/test_urcu_mb 20 1 20 > > > > > > SUMMARY ./tests/test_urcu_mb testdur 20 nr_readers 20 rdur > > > > > 0 wdur 0 nr_writers 1 wdelay 0 nr_reads 2100285330 > > > nr_writes > > > > > 3390219 nr_ops 2103675549 > > > > > > [root@localhost userspace-rcu]# ./tests/test_urcu_mb 20 1 20 > > > > > > SUMMARY ./tests/test_urcu_mb testdur 20 nr_readers 20 rdur > > > > > 0 wdur 0 nr_writers 1 wdelay 0 nr_reads 1619868562 > > > nr_writes > > > > > 3529478 nr_ops 1623398040 > > > > > > [root@localhost userspace-rcu]# ./tests/test_urcu_mb 20 1 20 > > > > > > SUMMARY ./tests/test_urcu_mb testdur 20 nr_readers 20 rdur > > > > > 0 wdur 0 nr_writers 1 wdelay 0 nr_reads 1949067038 > > > nr_writes > > > > > 3469334 nr_ops 1952536372 > > > > > > > > > > > > > > > > > > after patch: > > > > > > > > > > > > [root@localhost userspace-rcu]# ./tests/test_urcu_mb 20 1 20 > > > > > > SUMMARY ./tests/test_urcu_mb testdur 20 nr_readers 20 rdur > > > > > 0 wdur 0 nr_writers 1 wdelay 0 nr_reads 3380191848 > > > nr_writes > > > > > 4903248 nr_ops 3385095096 > > > > > > [root@localhost userspace-rcu]# ./tests/test_urcu_mb 20 1 20 > > > > > > SUMMARY ./tests/test_urcu_mb testdur 20 nr_readers 20 rdur > > > > > 0 wdur 0 nr_writers 1 wdelay 0 nr_reads 3397637486 > > > nr_writes > > > > > 4129809 nr_ops 3401767295 > > > > > > > > > > > > Singed-by: Lai Jiangshan <la...@cn.fujitsu.com > > > > > > <javascript:;><javascript:;>> > > > > > > --- > > > > > > diff --git a/urcu.c b/urcu.c > > > > > > index 15def09..436d71c 100644 > > > > > > --- a/urcu.c > > > > > > +++ b/urcu.c > > > > > > @@ -83,16 +83,7 @@ void __attribute__((destructor)) rcu_exit(void); > > > > > > #endif > > > > > > > > > > > > static pthread_mutex_t rcu_gp_lock = PTHREAD_MUTEX_INITIALIZER; > > > > > > - > > > > > > -int32_t rcu_gp_futex; > > > > > > - > > > > > > -/* > > > > > > - * Global grace period counter. > > > > > > - * Contains the current RCU_GP_CTR_PHASE. > > > > > > - * Also has a RCU_GP_COUNT of 1, to accelerate the reader fast > > > > > > path. > > > > > > - * Written to only by writer with mutex taken. Read by both writer > > > and > > > > > readers. > > > > > > - */ > > > > > > -unsigned long rcu_gp_ctr = RCU_GP_COUNT; > > > > > > +struct urcu_gp rcu_gp = { .ctr = RCU_GP_COUNT }; > > > > > > > > > > > > /* > > > > > > * Written to only by each individual reader. Read by both the > > > reader > > > > > and the > > > > > > @@ -217,8 +208,8 @@ static void wait_gp(void) > > > > > > { > > > > > > /* Read reader_gp before read futex */ > > > > > > smp_mb_master(RCU_MB_GROUP); > > > > > > - if (uatomic_read(&rcu_gp_futex) == -1) > > > > > > - futex_async(&rcu_gp_futex, FUTEX_WAIT, -1, > > > > > > + if (uatomic_read(&rcu_gp.futex) == -1) > > > > > > + futex_async(&rcu_gp.futex, FUTEX_WAIT, -1, > > > > > > NULL, NULL, 0); > > > > > > } > > > > > > > > > > > > @@ -232,12 +223,12 @@ static void wait_for_readers(struct > > > cds_list_head > > > > > *input_readers, > > > > > > /* > > > > > > * Wait for each thread URCU_TLS(rcu_reader).ctr to either > > > > > > * indicate quiescence (not nested), or observe the current > > > > > > - * rcu_gp_ctr value. > > > > > > + * rcu_gp.ctr value. > > > > > > */ > > > > > > for (;;) { > > > > > > wait_loops++; > > > > > > if (wait_loops == RCU_QS_ACTIVE_ATTEMPTS) { > > > > > > - uatomic_dec(&rcu_gp_futex); > > > > > > + uatomic_dec(&rcu_gp.futex); > > > > > > /* Write futex before read reader_gp */ > > > > > > smp_mb_master(RCU_MB_GROUP); > > > > > > } > > > > > > @@ -270,7 +261,7 @@ static void wait_for_readers(struct > > > > > > cds_list_head > > > > > *input_readers, > > > > > > if (wait_loops == RCU_QS_ACTIVE_ATTEMPTS) { > > > > > > /* Read reader_gp before write futex > > > > > > */ > > > > > > smp_mb_master(RCU_MB_GROUP); > > > > > > - uatomic_set(&rcu_gp_futex, 0); > > > > > > + uatomic_set(&rcu_gp.futex, 0); > > > > > > } > > > > > > break; > > > > > > } else { > > > > > > @@ -289,7 +280,7 @@ static void wait_for_readers(struct > > > > > > cds_list_head > > > > > *input_readers, > > > > > > if (wait_loops == RCU_QS_ACTIVE_ATTEMPTS) { > > > > > > /* Read reader_gp before write futex > > > > > > */ > > > > > > > > > > > > > > lttng-dev@lists.lttng.org<javascript:;><javascript:;> > > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > > > > > > > > -- > > > Mathieu Desnoyers > > > Operating System Efficiency R&D Consultant > > > EfficiOS Inc. > > > http://www.efficios.com > > > > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev