From: Kirill Tkhai <ktk...@parallels.com> So that CLOCK_MONOTONIC will be monotonic even if ve is migrated to another hw node.
Note, translating ve <-> abs time in clock_settime and timer_settime is not necessary because (1) clock_settime won't set monotonic clock and (2) timer_gettime always returns relative time. https://jira.sw.ru/browse/PSBM-13860 diff-posix_timers-reference-ct-monotonic-clock-from-ct-start Signed-off-by: Vladimir Davydov <vdavy...@parallels.com> Acked-by: Pavel Emelyanov <xe...@parallels.com> Signed-off-by: Kirill Tkhai <ktk...@parallels.com> +++ ve/posix-timers: reference ve monotonic clock from start in clock_nanosleep This is an addition to diff-posix_timers-reference-ct-monotonic-clock-from-ct-start Otherwise, apps that use sys_clock_nanosleep() to suspend their execution can hang after ve migration. diff-posix-timers-reference-ve-monotonic-clock-from-ve-start-in-clock_nanosleep Signed-off-by: Vladimir Davydov <vdavy...@parallels.com> Acked-by: Konstantin Khlebnikov <khlebni...@openvz.org> Acked-by: Pavel Emelyanov <xe...@parallels.com> Signed-off-by: Kirill Tkhai <ktk...@parallels.com> +++ timers: Port diff-ve-timers-convert-ve-monotonic-to-abs-time-when-setting-timerfd-2 Need this for docker, as sometimes systemd-tmpfiles-clean.timer inside a PCS7 CT is spamming dbus with requests to start corresponding service. And at the same time Docker tries to create cgroup for container and attach it to hierarchies like memory and blkio. That is because systemd timer was triggered with non-virtualized timerfd using plain host clock but check that timer is successfull uses virtualized clock_gettime and don't pass before proper(in-container) timer activation. And timers charges again and again starts service got in busy loop. https://jira.sw.ru/browse/PSBM-34017 v2: move the stubs to ve.h Port the following RH6 commit: Author: Vladimir Davydov Email: vdavy...@parallels.com Subject: fs: convert ve monotonic to abs time when setting timerfd Date: Fri, 15 Feb 2013 11:57:09 +0400 * [timers] corrected TFD_TIMER_ABSTIME timer handling, the issue led to high cpu usage inside a Fedora 18 CT by 'init' process (PSBM-18284) Monotonic time inside container, as it can be obtained using various system calls such as clock_gettime, is reported since start of the container, not since start of the whole system. This was made in order to avoid time issues while a container is migrated between different physical hosts, but this also introduced a lot of problems in time- related system calls because absolute monotonic time, which is in fact relative to container, passed to those system calls must be converted to system-wide monotonic time, which is used by kernel hrtimers. One of those buggy system calls is timerfd_settime which accepts as an argument absolute time if flag TFD_TIMER_ABSTIME is specified. The patch fixes it by converting container monotonic time to system- wide monotonic time using the monotonic_ve_to_abs() function, which was introduced earlier and is now exported for that reason. https://jira.sw.ru/browse/PSBM-18284 Signed-off-by: Vladimir Davydov <vdavy...@parallels.com> Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com> Signed-off-by: Kirill Tkhai <ktk...@odin.com> Reviewed-by: Vladimir Davydov <vdavy...@parallels.com> (cherry picked from vz7 commit 869542c24c41c0578b47d2ef83cfa63427e0e5e1) Signed-off-by: Konstantin Khorenko <khore...@virtuozzo.com> +++ timers should not get negative argument This patch fixes 25-sec delay on login into systemd based containers. Userspace application can set timer for past and expect that the timer will be expired immediately. This can do not work as expected inside migrated containers. Translated argument provided to timer can become negative, and according timer will sleep a very long time. https://jira.sw.ru/browse/PSBM-48475 CC: Vladimir Davydov <vdavy...@virtuozzo.com> CC: Konstantin Khorenko <khore...@virtuozzo.com> Signed-off-by: Vasily Averin <v...@virtuozzo.com> Acked-by: Cyrill Gorcunov <gorcu...@openvz.org> (cherry picked from vz7 commit a71fa19facb00472e47760255ab2e6fa16885732) Signed-off-by: Konstantin Khorenko <khore...@virtuozzo.com> (cherry-picked from vz8 commit 4dad672974ab ("ve/posix-timers: reference ve monotonic clock from ve start (v2)")) Signed-off-by: Nikita Yushchenko <nikita.yushche...@virtuozzo.com> --- fs/timerfd.c | 8 ++++-- include/linux/ve.h | 8 ++++++ kernel/time/posix-timers.c | 54 ++++++++++++++++++++++++++++++++++++-- 3 files changed, 66 insertions(+), 4 deletions(-) diff --git a/fs/timerfd.c b/fs/timerfd.c index c5509d2448e3..4f782f3690d0 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -27,6 +27,7 @@ #include <linux/compat.h> #include <linux/rcupdate.h> #include <linux/time_namespace.h> +#include <linux/ve.h> struct timerfd_ctx { union { @@ -435,8 +436,8 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags) return ufd; } -static int do_timerfd_settime(int ufd, int flags, - const struct itimerspec64 *new, +static int do_timerfd_settime(int ufd, int flags, + struct itimerspec64 *new, struct itimerspec64 *old) { struct fd f; @@ -500,6 +501,9 @@ static int do_timerfd_settime(int ufd, int flags, /* * Re-program the timer to the new value ... */ + if ((flags & TFD_TIMER_ABSTIME) && + (new->it_value.tv_sec || new->it_value.tv_nsec)) + monotonic_ve_to_abs(ctx->clockid, &new->it_value); ret = timerfd_setup(ctx, flags, new); spin_unlock_irq(&ctx->wqh.lock); diff --git a/include/linux/ve.h b/include/linux/ve.h index 552fa577e2f9..19a590bc86d4 100644 --- a/include/linux/ve.h +++ b/include/linux/ve.h @@ -97,6 +97,9 @@ static inline struct ve_struct *css_to_ve(struct cgroup_subsys_state *css) extern struct cgroup_subsys_state *ve_get_init_css(struct ve_struct *ve, int subsys_id); +extern void monotonic_abs_to_ve(clockid_t which_clock, struct timespec64 *tp); +extern void monotonic_ve_to_abs(clockid_t which_clock, struct timespec64 *tp); + #define ve_feature_set(ve, f) \ !!((ve)->features & VE_FEATURE_##f) @@ -140,6 +143,11 @@ static inline struct cgroup *cgroup_get_ve_root1(struct cgroup *cgrp) static inline int vz_security_family_check(struct net *net, int family, int type) { return 0; } static inline int vz_security_protocol_check(struct net *net, int protocol) { return 0; } +static inline void monotonic_abs_to_ve(clockid_t which_clock, + struct timespec64 *tp) { } +static inline void monotonic_ve_to_abs(clockid_t which_clock, + struct timepsec64 *tp) { } + #endif /* CONFIG_VE */ #endif /* _LINUX_VE_H */ diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 7363f81dc31a..3f91756f5421 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -31,6 +31,7 @@ #include <linux/compat.h> #include <linux/nospec.h> #include <linux/time_namespace.h> +#include <linux/ve.h> #include "timekeeping.h" #include "posix-timers.h" @@ -56,6 +57,46 @@ static const struct k_clock * const posix_clocks[]; static const struct k_clock *clockid_to_kclock(const clockid_t id); static const struct k_clock clock_realtime, clock_monotonic; +#define clock_is_monotonic(which_clock) \ + ((which_clock) == CLOCK_MONOTONIC || \ + (which_clock) == CLOCK_MONOTONIC_RAW || \ + (which_clock) == CLOCK_MONOTONIC_COARSE) + +#ifdef CONFIG_VE +static struct timespec64 zero_time; + +void monotonic_abs_to_ve(clockid_t which_clock, struct timespec64 *tp) +{ + struct timespec64 offset; + + if (!clock_is_monotonic(which_clock)) + return; + + offset = ns_to_timespec64(get_exec_env()->start_time); + set_normalized_timespec64(tp, + tp->tv_sec - offset.tv_sec, + tp->tv_nsec - offset.tv_nsec); +} + +void monotonic_ve_to_abs(clockid_t which_clock, struct timespec64 *tp) +{ + struct timespec64 offset; + + if (!clock_is_monotonic(which_clock)) + return; + + offset = ns_to_timespec64(get_exec_env()->start_time); + set_normalized_timespec64(tp, + tp->tv_sec + offset.tv_sec, + tp->tv_nsec + offset.tv_nsec); + + if (timespec64_compare(tp, &zero_time) <= 0) { + tp->tv_sec = 0; + tp->tv_nsec = 1; + } +} +#endif + /* * we assume that the new SIGEV_THREAD_ID shares no bits with the other * SIGEV values. Here we put out an error if this assumption fails. @@ -916,6 +957,9 @@ static int do_timer_settime(timer_t timer_id, int tmr_flags, if (!timr) return -EINVAL; + if ((flags & TIMER_ABSTIME) && + (new_spec64->it_value.tv_sec || new_spec64->it_value.tv_nsec)) + monotonic_ve_to_abs(timr->it_clock, &new_spec64->it_value); kc = timr->kclock; if (WARN_ON_ONCE(!kc || !kc->timer_set)) error = -EINVAL; @@ -1091,6 +1135,7 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, which_clock, error = kc->clock_get_timespec(which_clock, &kernel_tp); + monotonic_abs_to_ve(which_clock, &kernel_tp); if (!error && put_timespec64(&kernel_tp, tp)) error = -EFAULT; @@ -1173,6 +1218,7 @@ SYSCALL_DEFINE2(clock_gettime32, clockid_t, which_clock, err = kc->clock_get_timespec(which_clock, &ts); + monotonic_abs_to_ve(which_clock, &ts); if (!err && put_old_timespec32(&ts, tp)) err = -EFAULT; @@ -1259,8 +1305,10 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags, if (!timespec64_valid(&t)) return -EINVAL; - if (flags & TIMER_ABSTIME) + if (flags & TIMER_ABSTIME) { + monotonic_ve_to_abs(which_clock, &t); rmtp = NULL; + } current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE; current->restart_block.nanosleep.rmtp = rmtp; @@ -1286,8 +1334,10 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags, if (!timespec64_valid(&t)) return -EINVAL; - if (flags & TIMER_ABSTIME) + if (flags & TIMER_ABSTIME) { + monotonic_ve_to_abs(which_clock, &t); rmtp = NULL; + } current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE; current->restart_block.nanosleep.compat_rmtp = rmtp; -- 2.30.2 _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel