On Sun, Jul 12, 2015 at 01:35:35AM +0200, Oleg Nesterov wrote: > Hello, > > Let me make another attempt to push rcu_sync and add a _simple_ > improvment into percpu-rwsem. It already has another user (cgroups) > and I think it can have more. Peter has some use-cases. sb->s_writers > (which afaics is buggy btw) can be turned into percpu-rwsem too I think. > > Linus, I am mostly trying to convince you. Nobody else objected so far. > Could you please comment? > > Peter, if you agree with 5-7, can I add your Signed-off-by's ? > > To me, the most annoying problem with percpu_rw_semaphore is > synchronize_sched_expedited() which is called twice by every > down_write/up_write. I think it would be really nice to avoid it. > > Let's start with the simple test-case, > > #!/bin/bash > > perf probe -x /lib/libc.so.6 syscall > > for i in {1..1000}; do > echo 1 >| > /sys/kernel/debug/tracing/events/probe_libc/syscall/enable > echo 0 >| > /sys/kernel/debug/tracing/events/probe_libc/syscall/enable > done > > It needs ~ 13.5 seconds (2 CPUs, KVM). If we simply replace > synchronize_sched_expedited() with synchronize_sched() it takes > ~ 67.5 seconds. This is not good.
Yep, even if you avoided the write-release grace period, you would still be looking at something like 40 seconds, which is 3x. Some might consider that to be a performance regression. ;-) > With these patches it takes around 13.3 seconds again (a little > bit faster), and it doesn't use _expedited. synchronize_sched() > is called 1-2 (max 3) times in average. And now it does not > disturb the whole system. > > And just in case, I also measured > > for (i = 0; i < 1000000; ++i) { > percpu_down_write(&dup_mmap_sem); > percpu_up_write(&dup_mmap_sem); > } > > and it runs more than 1.5 times faster (to remind, only 2 CPUs), > but this is not that interesting, I agree. Your trick avoiding the grace periods during a writer-to-writer handoff are cute, and they are helping a lot here. Concurrent readers would have a tough time of it with this workload, though. They would all be serialized. > And note that the actual change in percpu-rwsem is really simple, > and imo it even makes the code simpler. (the last patch is off- > topic cleanup). > > So the only complication is rcu_sync itself. But, rightly or not (I > am obviously biased), I believe this new rcu infrastructure is natural > and useful, and I think it can have more users too. I don't have an objection to it, even in its current form (I did review it long ago), but it does need to have a user! > And. We can do more improvements in rcu_sync and percpu-rwsem, and > I don't only mean other optimizations from Peter. In particular, we > can extract the "wait for gp pass" from rcu_sync_enter() into another > helper, we can teach percpu_down_write() to allow multiple writers, > and more. As in a percpu_down_write() that allows up to (say) five concurrent write-holders? (Which can be useful, don't get me wrong.) Or do you mean as an internal optimization of some sort? Thanx, Paul > Oleg. > > include/linux/percpu-rwsem.h | 3 +- > include/linux/rcusync.h | 57 +++++++++++++++ > kernel/locking/percpu-rwsem.c | 78 ++++++--------------- > kernel/rcu/Makefile | 2 +- > kernel/rcu/sync.c | 152 > +++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 235 insertions(+), 57 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/