On Sat, 26 Oct 2019, Christophe Leroy wrote: > Le 26/10/2019 à 17:53, Thomas Gleixner a écrit : > > > > > > gettimeofday: vdso: 750 nsec/call > > > > > > > > > > > > gettimeofday: vdso: 1533 nsec/call > > > > > > > > Small improvement (3%) with the proposed change: > > > > > > > > gettimeofday: vdso: 1485 nsec/call > > > > > > By inlining do_hres() I get the following: > > > > > > gettimeofday: vdso: 1072 nsec/call > > > > What's the effect for clock_gettime()? > > > > gettimeofday() is suboptimal vs. the PPC ASM variant due to an extra > > division, but clock_gettime() should be 1:1 comparable. > > > > Original PPC asm: > clock-gettime-realtime: vdso: 928 nsec/call > > My original RFC: > clock-gettime-realtime: vdso: 1570 nsec/call > > With your suggested vdso_calc_delta(): > clock-gettime-realtime: vdso: 1512 nsec/call > > With your vdso_calc_delta() and inlined do_hres(): > clock-gettime-realtime: vdso: 1302 nsec/call
That does not make any sense at all. gettimeofday() is basically the same as clock_gettime(REALTIME) and does an extra division. So I would expect it to be slower. Let's look at the code: __cvdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz) { const struct vdso_data *vd = __arch_get_vdso_data(); if (likely(tv != NULL)) { struct __kernel_timespec ts; if (do_hres(&vd[CS_HRES_COARSE], CLOCK_REALTIME, &ts)) return gettimeofday_fallback(tv, tz); tv->tv_sec = ts.tv_sec; tv->tv_usec = (u32)ts.tv_nsec / NSEC_PER_USEC; IIRC PPC did some magic math tricks to avoid that. Could you just for the fun of it replace this division with (u32)ts.tv_nsec >> 10; That's obviously incorrect, but it would tell us how heavy the division is. If that brings us close we might do something special for gettimeofday(). OTOH, last time I checked clock_gettime() was by far more used than gettimeofday() but that obviously depends on the use cases. } ... } and __cvdso_clock_gettime_common(clockid_t clock, struct __kernel_timespec *ts) { const struct vdso_data *vd = __arch_get_vdso_data(); u32 msk; /* Check for negative values or invalid clocks */ if (unlikely((u32) clock >= MAX_CLOCKS)) return -1; /* * Convert the clockid to a bitmask and use it to check which * clocks are handled in the VDSO directly. */ msk = 1U << clock; if (likely(msk & VDSO_HRES)) { return do_hres(&vd[CS_HRES_COARSE], clock, ts); So this is the extra code which is executed for clock_gettime(REAL) which is pure logic and certainly not as heavyweight as the division in the gettimeofday() code path. } static __maybe_unused int __cvdso_clock_gettime(clockid_t clock, struct __kernel_timespec *ts) { int ret = __cvdso_clock_gettime_common(clock, ts); if (unlikely(ret)) return clock_gettime_fallback(clock, ts); return 0; } One thing which might be worth to try as well is to mark all functions in that file as inline. The speedup by the do_hres() inlining was impressive on PPC. Thanks, tglx