[Added a whole bunch of ccs] On Mon, Aug 13, 2018 at 6:17 PM, Matt Rickard <m...@softrans.com.au> wrote: > Process clock_gettime(CLOCK_TAI) in VDSO. This makes the call about as fast as > CLOCK_REALTIME instead of taking about four times as long. > > Signed-off-by: Matt Rickard <m...@softrans.com.au> > --- > arch/x86/entry/vdso/vclock_gettime.c | 30 ++++++++++++++++++++++++++++++ > arch/x86/entry/vsyscall/vsyscall_gtod.c | 2 ++ > arch/x86/include/asm/vgtod.h | 1 + > 3 files changed, 33 insertions(+) > > diff --git a/arch/x86/entry/vdso/vclock_gettime.c > b/arch/x86/entry/vdso/vclock_gettime.c > index f19856d95c60..bc8d8f086721 100644 > --- a/arch/x86/entry/vdso/vclock_gettime.c > +++ b/arch/x86/entry/vdso/vclock_gettime.c
... > notrace static void do_realtime_coarse(struct timespec *ts) > { > unsigned long seq; > @@ -284,8 +305,17 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct > timespec *ts) > do_monotonic_coarse(ts); > break; > default: > + /* Doubled switch statement to work around kernel Makefile error */ > + /* See: > https://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg567499.html */ NAK. The issue here (after reading that thread) is that, with our current compile options, gcc generates a jump table once the switch statement hits five entries. And it uses retpolines for it, and somehow it generates the relocations in such a way that the vDSO build fails. We need to address this so that the vDSO build is reliable, but there's an important question here: Should the vDSO be built with retpolines, or should it be built with indirect branches? Or should we go out of our way to make sure that the vDSO contains neither retpolines nor indirect branches? We could accomplish the latter (sort of) by manually converting the switch into the appropriate if statements, but that's rather ugly. (Hmm. We should add exports to directly read each clock source. They'll be noticeably faster, especially when cache-and-predictor-code.)