On Sat, Aug 05, 2023 at 01:33:05AM -0400, A Tammy wrote: > > On 8/5/23 00:49, Scott Cheloha wrote: > > On Sat, Aug 05, 2023 at 12:17:48AM -0400, aisha wrote: > >> On 22/09/10 01:53PM, Visa Hankala wrote: > >>> On Wed, Aug 31, 2022 at 04:48:37PM -0400, aisha wrote: > >>>> I've added a patch which adds support for NOTE_{,U,M,N}SECONDS for > >>>> EVFILT_TIMER in the kqueue interface. > >>> It sort of makes sense to add an option to specify timeouts in > >>> sub-millisecond precision. It feels complete overengineering to add > >>> multiple time units on the level of the kernel interface. However, > >>> it looks that FreeBSD and NetBSD have already done this following > >>> macOS' lead... > >>> > >>>> I've also added the NOTE_ABSTIME but haven't done any actual > >>>> implementation > >>>> there as I am not sure how the `data` field should be interpreted (is it > >>>> absolute time in seconds since epoch?). > >>> I think FreeBSD and NetBSD take NOTE_ABSTIME as time since the epoch. > >>> > >>> Below is a revised patch that takes into account some corner cases. > >>> It tries to be API-compatible with FreeBSD and NetBSD. I have adjusted > >>> the NOTE_{,M,U,N}SECONDS flags so that they are enum-like. > >>> > >>> The manual page bits are from NetBSD. > >>> > >>> It is quite late to introduce a feature like this within this release > >>> cycle. Until now, the timer code has ignored the fflags field. There > >>> might be pieces of software that are careless with struct kevent and > >>> that would break as a result of this patch. Programs that are widely > >>> used on different BSDs are probably fine already, though. > >> > >> Sorry, I had forgotten this patch for a long time!!! I've been running > >> with this for a while now and it's been working nicely. > > > > Where is this being used in ports? I think having "one of each" for > > seconds, milliseconds, microseconds, and nanoseconds is (as visa > > noted) way, way over-the-top. > > I was using it with a port that I sent out a while ago but never got > into tree (was before I joined the project) - > https://marc.info/?l=openbsd-ports&m=165715874509440&w=2
If nothing in ports is using this I am squeamish about adding it. Once we add it, we're stuck maintaining it, warts and all. If www/workflow were in the tree I could see the upside. Is it in ports? It looks like workflow actually wants timerfd(2) from Linux and is simulating timerfd(2) with EVFILT_TIMER and NOTE_NSECONDS: https://github.com/sogou/workflow/blob/80b3dfbad2264bcd79ba37811c66421490e337d2/src/kernel/poller.c#L227 I think timerfd(2) is the superior interface here. It keeps the POSIX interval timer semantics without all the signal delivery baggage. It also supports multiple clocks and starting a periodic timeout from an absolute starting time. So, if the goal is "add www/workflow to ports", adding timerfd(2) might be the right thing. > I also agree with it being over the top but that's the way it is in > net/freebsd, I'm also fine with breaking compatibility and only keeping > nano, no preferences either way. Well, if we're going to add it (if), we should add all of it. The vast majority of the code is not conversion code: if we add support for NOTE_NSECONDS, adding support for the other units is trivial, and there is value in being fully compatible with other implementations. > > The original EVFILT_TIMER supported only milliseconds, yes. Given > > that it debuted in the late 90s, I think that was a bad choice. But > > when milliseconds were insufficiently precise, the obvious thing would > > be to add support for nanoseconds... and then stop. > > > > The decision to use the UTC clock with no option to select a different > > clockid_t for NOTE_ABSTIME is also unfortunate. > > Yes, furthermore this was very unclear as I couldn't find this in the > man pages for either of net/freebsd. > > > Grumble. > > > >> I had an unrelated question inlined. > >> > >> [...] > >>> static void > >>> -filt_timer_timeout_add(struct knote *kn) > >>> +filt_timeradd(struct knote *kn, struct timespec *ts) > >>> { > >>> - struct timeval tv; > >>> + struct timespec expiry, now; > >>> struct timeout *to = kn->kn_hook; > >>> int tticks; > >>> > >>> - tv.tv_sec = kn->kn_sdata / 1000; > >>> - tv.tv_usec = (kn->kn_sdata % 1000) * 1000; > >>> - tticks = tvtohz(&tv); > >>> - /* Remove extra tick from tvtohz() if timeout has fired before. */ > >>> + if (kn->kn_sfflags & NOTE_ABSTIME) { > >>> + nanotime(&now); > >>> + if (timespeccmp(ts, &now, >)) { > >>> + timespecsub(ts, &now, &expiry); > >>> + /* XXX timeout_at_ts */ > >>> + timeout_add(to, tstohz(&expiry)); > > visa: > > > > we should use timeout_abs_ts() here. I need to adjust it, though. > > > >>> + } else { > >>> + /* Expire immediately. */ > >>> + filt_timerexpire(kn); > >>> + } > >>> + return; > >>> + } > >>> + > >>> + tticks = tstohz(ts); > >>> + /* Remove extra tick from tstohz() if timeout has fired before. */ > >>> if (timeout_triggered(to)) > >>> tticks--; > >> I always wondered why one tick was removed, is one tick really > >> that important? And does a timeout firing only cost one tick? > > When you convert a duration to a count of ticks with tstohz(), it adds > > an extra tick to the result to keep you from undershooting your > > timeout. You start counting your timeout at the start of the *next* > > tick, otherwise the timeout might fire early. However, after the > > timeout has expired once, you no longer need the extra tick because > > you can (more or less) assume that the timeout is running at the start > > of the new tick. > > > > I know that sounds a little fuzzy, but in practice it works. > > Haha, these are the kind of weird idiosyncrasies that are fun to know > about. Thank you very much for the explanation! :D > > So I went around looking at how large a tick really is and seems like we > get it through kern.clockrate?? (from man tick) > > aisha@fwall ~ $ sysctl kern.clockrate > kern.clockrate=tick = 10000, hz = 100, profhz = 1000, stathz = 100 > > so presumably each tick is 1/10000 of a second (is this right?), [...] kern.clockrate's "tick" member represents the number of microseconds in a hardclock tick. It's just 1,000,000 / hz. > and things are getting scheduled in terms of ticks, so how is it even > possible to get nanosecond level accuracy there? We have a nanosecond resolution timeout API, but it isn't super useful yet because the timeout layer doesn't use the clock interrupt API. I am hoping to add this in the next release cycle. > From more looking around it seems like atleast x86 has TSC which > provides better resolution (presumably similar things exist for other > archs) but I don't see it being used anywhere here in an obvious > fashion. man pctr doesn't mention it being used for time measurement. Every practical OpenBSD platform has access to a nice clock. Fixed-frequency, high resolution (1us or higher), and high precision (reads are fast). -- Here is a revised patch: - Only validate inputs in filt_timervalidate(). Do the input conversion in a separate routine, filt_timer_sdata_to_nsecs(). - Schedule the timeout in filt_timerstart(). Return zero if the absolute time has already expired and the timeout was not scheduled. The caller can then call filt_timerexpire(). This duplicates some code across filt_timerattach() and filt_timermodify(), but I think it's a little less magical: filt_timerstart() does *one* thing and leaves error handling to the caller. - If the input isn't an absolute timeout we need to round sdata up from 0 to 1. This is what FreeBSD does. I think this is bad behavior. A periodic timeout of zero is meaningless. The sensible thing would be to reject the input with EINVAL. But I didn't design the API so that ship has sailed. - Use the high resolution timeout API instead of the tick-based API. In particular, we can use the UTC clock for absolute timeouts, just like FreeBSD does. - In filt_timerexpire(), use timeout_advance() to count any expirations we missed due to processing delays. The UTC timeout support in kern_timeout.c is a rough draft. There's a lot going on in there. But if we included it we would be more compatible with FreeBSD. Index: sys/event.h =================================================================== RCS file: /cvs/src/sys/sys/event.h,v retrieving revision 1.69 diff -u -p -r1.69 event.h --- sys/event.h 10 Feb 2023 14:34:17 -0000 1.69 +++ sys/event.h 8 Aug 2023 15:38:39 -0000 @@ -122,6 +122,13 @@ struct kevent { /* data/hint flags for EVFILT_DEVICE, shared with userspace */ #define NOTE_CHANGE 0x00000001 /* device change event */ +/* additional flags for EVFILT_TIMER */ +#define NOTE_MSECONDS 0x00000000 /* data is milliseconds */ +#define NOTE_SECONDS 0x00000001 /* data is seconds */ +#define NOTE_USECONDS 0x00000002 /* data is microseconds */ +#define NOTE_NSECONDS 0x00000003 /* data is nanoseconds */ +#define NOTE_ABSTIME 0x00000010 /* timeout is absolute */ + /* * This is currently visible to userland to work around broken * programs which pull in <sys/proc.h> or <sys/selinfo.h>. Index: kern/kern_event.c =================================================================== RCS file: /cvs/src/sys/kern/kern_event.c,v retrieving revision 1.196 diff -u -p -r1.196 kern_event.c --- kern/kern_event.c 11 Apr 2023 00:45:09 -0000 1.196 +++ kern/kern_event.c 8 Aug 2023 15:38:39 -0000 @@ -449,55 +449,127 @@ filt_proc(struct knote *kn, long hint) return (kn->kn_fflags != 0); } -static void -filt_timer_timeout_add(struct knote *kn) +#define NOTE_TIMER_UNITMASK \ + (NOTE_SECONDS | NOTE_MSECONDS | NOTE_USECONDS | NOTE_NSECONDS) + +static int +filt_timervalidate(int flags, int64_t sdata) +{ + if (flags & ~(NOTE_TIMER_UNITMASK | NOTE_ABSTIME)) + return (EINVAL); + + switch (flags & NOTE_TIMER_UNITMASK) { + case NOTE_SECONDS: + case NOTE_MSECONDS: + case NOTE_USECONDS: + case NOTE_NSECONDS: + break; + default: + return (EINVAL); + } + + if (sdata < 0) + return (EINVAL); + + return (0); +} + +static uint64_t +filt_timer_sdata_to_nsecs(const struct knote *kn) +{ + int unit = kn->kn_sfflags & NOTE_TIMER_UNITMASK; + + switch (unit) { + case NOTE_SECONDS: + return SEC_TO_NSEC(kn->kn_sdata); + case NOTE_MSECONDS: + return MSEC_TO_NSEC(kn->kn_sdata); + case NOTE_USECONDS: + return USEC_TO_NSEC(kn->kn_sdata); + case NOTE_NSECONDS: + return kn->kn_sdata; + default: + panic("%s: invalid EVFILT_TIMER unit: %d", __func__, unit); + } +} + +/* + * Attempt to schedule the timeout. Returns zero if the timeout is + * not scheduled because the absolute time has already expired. + */ +static int +filt_timerstart(struct knote *kn) { - struct timeval tv; + struct timespec expiry, now, timeout; struct timeout *to = kn->kn_hook; - int tticks; - tv.tv_sec = kn->kn_sdata / 1000; - tv.tv_usec = (kn->kn_sdata % 1000) * 1000; - tticks = tvtohz(&tv); - /* Remove extra tick from tvtohz() if timeout has fired before. */ - if (timeout_triggered(to)) - tticks--; - timeout_add(to, (tticks > 0) ? tticks : 1); + NSEC_TO_TIMESPEC(filt_timer_sdata_to_nsecs(kn), &timeout); + if (kn->kn_sfflags & NOTE_ABSTIME) { + nanotime(&now); + if (timespeccmp(&timeout, &now, <=)) + return 0; + expiry = timeout; + timeout_set_flags(to, filt_timerexpire, kn, KCLOCK_UTC, 0); + } else { + nanouptime(&now); + timespecadd(&now, &timeout, &expiry); + timeout_set_flags(to, filt_timerexpire, kn, KCLOCK_UPTIME, 0); + } + timeout_abs_ts(to, &expiry); + return 1; } void filt_timerexpire(void *knx) { + uint64_t count; struct knote *kn = knx; struct kqueue *kq = kn->kn_kq; + struct timeout *to = kn->kn_hook; - kn->kn_data++; + /* + * One-shot timers and absolute timers expire only once. + * Periodic timers, on the other hand, may expire faster + * than we can service them. timeout_advance() reschedules + * a periodic timer while computing how many times the timer + * expired. + */ + if ((kn->kn_flags & EV_ONESHOT) || (kn->kn_sfflags & NOTE_ABSTIME)) + count = 1; + else + timeout_advance(to, filt_timer_sdata_to_nsecs(kn), &count); + kn->kn_data += count; mtx_enter(&kq->kq_lock); knote_activate(kn); mtx_leave(&kq->kq_lock); - - if ((kn->kn_flags & EV_ONESHOT) == 0) - filt_timer_timeout_add(kn); } - /* - * data contains amount of time to sleep, in milliseconds + * data contains a timeout. fflags clarifies what the timeout means. */ int filt_timerattach(struct knote *kn) { struct timeout *to; + int error; + + error = filt_timervalidate(kn->kn_sfflags, kn->kn_sdata); + if (error != 0) + return (error); if (kq_ntimeouts > kq_timeoutmax) return (ENOMEM); kq_ntimeouts++; - kn->kn_flags |= EV_CLEAR; /* automatically set */ - to = malloc(sizeof(*to), M_KEVENT, M_WAITOK); - timeout_set(to, filt_timerexpire, kn); + if ((kn->kn_sfflags & NOTE_ABSTIME) == 0) { + kn->kn_flags |= EV_CLEAR; /* automatically set */ + if (kn->kn_sdata == 0) + kn->kn_sdata = 1; + } + to = malloc(sizeof(*to), M_KEVENT, M_WAITOK | M_ZERO); kn->kn_hook = to; - filt_timer_timeout_add(kn); + if (!filt_timerstart(kn)) + filt_timerexpire(kn); return (0); } @@ -505,11 +577,11 @@ filt_timerattach(struct knote *kn) void filt_timerdetach(struct knote *kn) { - struct timeout *to; + struct timeout *to = kn->kn_hook; - to = (struct timeout *)kn->kn_hook; timeout_del_barrier(to); free(to, M_KEVENT, sizeof(*to)); + kn->kn_hook = NULL; kq_ntimeouts--; } @@ -518,6 +590,14 @@ filt_timermodify(struct kevent *kev, str { struct kqueue *kq = kn->kn_kq; struct timeout *to = kn->kn_hook; + int error; + + error = filt_timervalidate(kev->fflags, kev->data); + if (error != 0) { + kev->flags |= EV_ERROR; + kev->data = error; + return (0); + } /* Reset the timer. Any pending events are discarded. */ @@ -531,9 +611,13 @@ filt_timermodify(struct kevent *kev, str kn->kn_data = 0; knote_assign(kev, kn); - /* Reinit timeout to invoke tick adjustment again. */ - timeout_set(to, filt_timerexpire, kn); - filt_timer_timeout_add(kn); + if ((kn->kn_sfflags & NOTE_ABSTIME) == 0) { + kn->kn_flags |= EV_CLEAR; /* automatically set */ + if (kn->kn_sdata == 0) + kn->kn_sdata = 1; + } + if (!filt_timerstart(kn)) + filt_timerexpire(kn); return (0); } @@ -551,7 +635,6 @@ filt_timerprocess(struct knote *kn, stru return (active); } - /* * filt_seltrue: Index: sys/timeout.h =================================================================== RCS file: /cvs/src/sys/sys/timeout.h,v retrieving revision 1.47 diff -u -p -r1.47 timeout.h --- sys/timeout.h 31 Dec 2022 16:06:24 -0000 1.47 +++ sys/timeout.h 8 Aug 2023 15:38:39 -0000 @@ -27,6 +27,7 @@ #ifndef _SYS_TIMEOUT_H_ #define _SYS_TIMEOUT_H_ +#include <sys/queue.h> #include <sys/time.h> struct circq { @@ -36,6 +37,7 @@ struct circq { struct timeout { struct circq to_list; /* timeout queue, don't move */ + TAILQ_ENTRY(timeout) to_utc_link; /* UTC queue link */ struct timespec to_abstime; /* absolute time to run at */ void (*to_func)(void *); /* function to call */ void *to_arg; /* function argument */ @@ -85,10 +87,12 @@ int timeout_sysctl(void *, size_t *, voi #define KCLOCK_NONE (-1) /* dummy clock for sanity checks */ #define KCLOCK_UPTIME 0 /* uptime clock; time since boot */ -#define KCLOCK_MAX 1 +#define KCLOCK_UTC 1 /* UTC clock; time since unix epoch */ +#define KCLOCK_MAX 2 #define TIMEOUT_INITIALIZER_FLAGS(_fn, _arg, _kclock, _flags) { \ .to_list = { NULL, NULL }, \ + .to_utc_link = { NULL, NULL }, \ .to_abstime = { .tv_sec = 0, .tv_nsec = 0 }, \ .to_func = (_fn), \ .to_arg = (_arg), \ @@ -112,6 +116,7 @@ int timeout_add_usec(struct timeout *, i int timeout_add_nsec(struct timeout *, int); int timeout_abs_ts(struct timeout *, const struct timespec *); +int timeout_advance(struct timeout *, uint64_t, uint64_t *); int timeout_del(struct timeout *); int timeout_del_barrier(struct timeout *); @@ -119,6 +124,7 @@ void timeout_barrier(struct timeout *); void timeout_adjust_ticks(int); void timeout_hardclock_update(void); +void timeout_reset_kclock_offset(int, const struct timespec *); void timeout_startup(void); #endif /* _KERNEL */ Index: kern/kern_timeout.c =================================================================== RCS file: /cvs/src/sys/kern/kern_timeout.c,v retrieving revision 1.95 diff -u -p -r1.95 kern_timeout.c --- kern/kern_timeout.c 29 Jul 2023 06:52:08 -0000 1.95 +++ kern/kern_timeout.c 8 Aug 2023 15:38:39 -0000 @@ -75,6 +75,7 @@ struct circq timeout_wheel_kc[BUCKETS]; struct circq timeout_new; /* [T] New, unscheduled timeouts */ struct circq timeout_todo; /* [T] Due or needs rescheduling */ struct circq timeout_proc; /* [T] Due + needs process context */ +TAILQ_HEAD(, timeout) timeout_utc; /* [T] UTC-based timeouts */ time_t timeout_level_width[WHEELCOUNT]; /* [I] Wheel level width (seconds) */ struct timespec tick_ts; /* [I] Length of a tick (1/hz secs) */ @@ -166,15 +167,22 @@ struct lock_type timeout_spinlock_type = ((needsproc) ? &timeout_sleeplock_obj : &timeout_spinlock_obj) #endif +void kclock_nanotime(int, struct timespec *); void softclock(void *); void softclock_create_thread(void *); void softclock_process_kclock_timeout(struct timeout *, int); void softclock_process_tick_timeout(struct timeout *, int); void softclock_thread(void *); +int timeout_abs_ts_locked(struct timeout *, const struct timespec *); void timeout_barrier_timeout(void *); uint32_t timeout_bucket(const struct timeout *); +void timeout_dequeue(struct timeout *); +void timeout_enqueue(struct circq *, struct timeout *); uint32_t timeout_maskwheel(uint32_t, const struct timespec *); void timeout_run(struct timeout *); +uint64_t timespec_advance_nsec(struct timespec *, uint64_t, + const struct timespec *); +void u64_sat_add(uint64_t *, uint64_t, uint64_t); /* * The first thing in a struct timeout is its struct circq, so we @@ -228,6 +236,7 @@ timeout_startup(void) CIRCQ_INIT(&timeout_new); CIRCQ_INIT(&timeout_todo); CIRCQ_INIT(&timeout_proc); + TAILQ_INIT(&timeout_utc); for (b = 0; b < nitems(timeout_wheel); b++) CIRCQ_INIT(&timeout_wheel[b]); for (b = 0; b < nitems(timeout_wheel_kc); b++) @@ -252,6 +261,25 @@ timeout_proc_init(void) } void +timeout_reset_kclock_offset(int kclock, const struct timespec *offset) +{ + struct kclock *kc = &timeout_kclock[kclock]; + struct timeout *to; + + KASSERT(kclock == KCLOCK_UTC); + + mtx_enter(&timeout_mutex); + if (kclock == KCLOCK_UTC && timespeccmp(&kc->kc_offset, offset, <)) { + TAILQ_FOREACH(to, &timeout_utc, to_utc_link) { + CIRCQ_REMOVE(&to->to_list); + CIRCQ_INSERT_TAIL(&timeout_todo, &to->to_list); + } + } + kc->kc_offset = *offset; + mtx_leave(&timeout_mutex); +} + +void timeout_set(struct timeout *new, void (*fn)(void *), void *arg) { timeout_set_flags(new, fn, arg, KCLOCK_NONE, 0); @@ -273,6 +301,28 @@ timeout_set_proc(struct timeout *new, vo timeout_set_flags(new, fn, arg, KCLOCK_NONE, TIMEOUT_PROC); } +void +timeout_dequeue(struct timeout *to) +{ + KASSERT(ISSET(to->to_flags, TIMEOUT_ONQUEUE)); + + CIRCQ_REMOVE(&to->to_list); + if (to->to_kclock == KCLOCK_UTC) + TAILQ_REMOVE(&timeout_utc, to, to_utc_link); + CLR(to->to_flags, TIMEOUT_ONQUEUE); +} + +void +timeout_enqueue(struct circq *queue, struct timeout *to) +{ + KASSERT(!ISSET(to->to_flags, TIMEOUT_ONQUEUE)); + + CIRCQ_INSERT_TAIL(queue, &to->to_list); + if (to->to_kclock == KCLOCK_UTC) + TAILQ_INSERT_TAIL(&timeout_utc, to, to_utc_link); + SET(to->to_flags, TIMEOUT_ONQUEUE); +} + int timeout_add(struct timeout *new, int to_ticks) { @@ -297,14 +347,13 @@ timeout_add(struct timeout *new, int to_ */ if (ISSET(new->to_flags, TIMEOUT_ONQUEUE)) { if (new->to_time - ticks < old_time - ticks) { - CIRCQ_REMOVE(&new->to_list); - CIRCQ_INSERT_TAIL(&timeout_new, &new->to_list); + timeout_dequeue(new); + timeout_enqueue(&timeout_new, new); } tostat.tos_readded++; ret = 0; } else { - SET(new->to_flags, TIMEOUT_ONQUEUE); - CIRCQ_INSERT_TAIL(&timeout_new, &new->to_list); + timeout_enqueue(&timeout_new, new); } #if NKCOV > 0 if (!kcov_cold) @@ -383,13 +432,23 @@ timeout_add_nsec(struct timeout *to, int int timeout_abs_ts(struct timeout *to, const struct timespec *abstime) { - struct timespec old_abstime; - int ret = 1; + int status; mtx_enter(&timeout_mutex); + status = timeout_abs_ts_locked(to, abstime); + mtx_leave(&timeout_mutex); + return status; +} + +int +timeout_abs_ts_locked(struct timeout *to, const struct timespec *abstime) +{ + struct timespec old_abstime; + int ret = 1; + MUTEX_ASSERT_LOCKED(&timeout_mutex); KASSERT(ISSET(to->to_flags, TIMEOUT_INITIALIZED)); - KASSERT(to->to_kclock != KCLOCK_NONE); + KASSERT(to->to_kclock > KCLOCK_NONE && to->to_kclock < KCLOCK_MAX); old_abstime = to->to_abstime; to->to_abstime = *abstime; @@ -397,14 +456,13 @@ timeout_abs_ts(struct timeout *to, const if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) { if (timespeccmp(abstime, &old_abstime, <)) { - CIRCQ_REMOVE(&to->to_list); - CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list); + timeout_dequeue(to); + timeout_enqueue(&timeout_new, to); } tostat.tos_readded++; ret = 0; } else { - SET(to->to_flags, TIMEOUT_ONQUEUE); - CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list); + timeout_enqueue(&timeout_new, to); } #if NKCOV > 0 if (!kcov_cold) @@ -412,9 +470,26 @@ timeout_abs_ts(struct timeout *to, const #endif tostat.tos_added++; + return ret; +} + +int +timeout_advance(struct timeout *to, uint64_t intvl, uint64_t *ocount) +{ + struct timespec next, now; + uint64_t count; + int status; + + mtx_enter(&timeout_mutex); + kclock_nanotime(to->to_kclock, &now); + next = to->to_abstime; + count = timespec_advance_nsec(&next, intvl, &now); + status = timeout_abs_ts_locked(to, &next); mtx_leave(&timeout_mutex); - return ret; + if (ocount != NULL) + *ocount = count; + return status; } int @@ -424,8 +499,7 @@ timeout_del(struct timeout *to) mtx_enter(&timeout_mutex); if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) { - CIRCQ_REMOVE(&to->to_list); - CLR(to->to_flags, TIMEOUT_ONQUEUE); + timeout_dequeue(to); tostat.tos_cancelled++; ret = 1; } @@ -468,11 +542,10 @@ timeout_barrier(struct timeout *to) mtx_enter(&timeout_mutex); barrier.to_time = ticks; - SET(barrier.to_flags, TIMEOUT_ONQUEUE); if (procflag) - CIRCQ_INSERT_TAIL(&timeout_proc, &barrier.to_list); + timeout_enqueue(&timeout_proc, &barrier); else - CIRCQ_INSERT_TAIL(&timeout_todo, &barrier.to_list); + timeout_enqueue(&timeout_todo, &barrier); mtx_leave(&timeout_mutex); @@ -496,19 +569,18 @@ uint32_t timeout_bucket(const struct timeout *to) { struct timespec diff, shifted_abstime; - struct kclock *kc; + struct kclock *kc = &timeout_kclock[to->to_kclock]; uint32_t level; - KASSERT(to->to_kclock == KCLOCK_UPTIME); - kc = &timeout_kclock[to->to_kclock]; - + KASSERT(to->to_kclock > KCLOCK_NONE && to->to_kclock < KCLOCK_MAX); KASSERT(timespeccmp(&kc->kc_lastscan, &to->to_abstime, <)); + timespecsub(&to->to_abstime, &kc->kc_lastscan, &diff); for (level = 0; level < nitems(timeout_level_width) - 1; level++) { if (diff.tv_sec < timeout_level_width[level]) break; } - timespecadd(&to->to_abstime, &kc->kc_offset, &shifted_abstime); + timespecsub(&to->to_abstime, &kc->kc_offset, &shifted_abstime); return level * WHEELSIZE + timeout_maskwheel(level, &shifted_abstime); } @@ -620,7 +692,6 @@ timeout_run(struct timeout *to) MUTEX_ASSERT_LOCKED(&timeout_mutex); - CLR(to->to_flags, TIMEOUT_ONQUEUE); SET(to->to_flags, TIMEOUT_TRIGGERED); fn = to->to_func; @@ -652,14 +723,13 @@ softclock_process_kclock_timeout(struct tostat.tos_scheduled++; if (!new) tostat.tos_rescheduled++; - CIRCQ_INSERT_TAIL(&timeout_wheel_kc[timeout_bucket(to)], - &to->to_list); + timeout_enqueue(&timeout_wheel_kc[timeout_bucket(to)], to); return; } if (!new && timespeccmp(&to->to_abstime, &kc->kc_late, <=)) tostat.tos_late++; if (ISSET(to->to_flags, TIMEOUT_PROC)) { - CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list); + timeout_enqueue(&timeout_proc, to); return; } timeout_run(to); @@ -675,13 +745,13 @@ softclock_process_tick_timeout(struct ti tostat.tos_scheduled++; if (!new) tostat.tos_rescheduled++; - CIRCQ_INSERT_TAIL(&BUCKET(delta, to->to_time), &to->to_list); + timeout_enqueue(&BUCKET(delta, to->to_time), to); return; } if (!new && delta < 0) tostat.tos_late++; if (ISSET(to->to_flags, TIMEOUT_PROC)) { - CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list); + timeout_enqueue(&timeout_proc, to); return; } timeout_run(to); @@ -697,11 +767,8 @@ softclock_process_tick_timeout(struct ti void softclock(void *arg) { - struct timeout *first_new, *to; - int needsproc, new; - - first_new = NULL; - new = 0; + struct timeout *first_new = NULL, *to; + int needsproc, new = 0; mtx_enter(&timeout_mutex); if (!CIRCQ_EMPTY(&timeout_new)) @@ -709,7 +776,7 @@ softclock(void *arg) CIRCQ_CONCAT(&timeout_todo, &timeout_new); while (!CIRCQ_EMPTY(&timeout_todo)) { to = timeout_from_circq(CIRCQ_FIRST(&timeout_todo)); - CIRCQ_REMOVE(&to->to_list); + timeout_dequeue(to); if (to == first_new) new = 1; if (to->to_kclock != KCLOCK_NONE) @@ -758,7 +825,7 @@ softclock_thread(void *arg) mtx_enter(&timeout_mutex); while (!CIRCQ_EMPTY(&timeout_proc)) { to = timeout_from_circq(CIRCQ_FIRST(&timeout_proc)); - CIRCQ_REMOVE(&to->to_list); + timeout_dequeue(to); timeout_run(to); tostat.tos_run_thread++; } @@ -768,6 +835,108 @@ softclock_thread(void *arg) splx(s); } +void +kclock_nanotime(int kclock, struct timespec *now) +{ + switch (kclock) { + case KCLOCK_UPTIME: + nanouptime(now); + return; + case KCLOCK_UTC: + nanotime(now); + return; + default: + panic("%s: invalid kclock: %d", __func__, kclock); + } +} + +void +u64_sat_add(uint64_t *sum, uint64_t a, uint64_t b) +{ + if (a + b < a) + *sum = UINT64_MAX; + else + *sum = a + b; +} + +/* + * Given an interval timer with a period of invtl that last expired + * at absolute time abs, find the timer's next expiration time and + * write it back to abs. If abs has not yet expired, abs is not + * modified. + * + * Returns the number of intervals that have elapsed. If the number + * of elapsed intervals would overflow a 64-bit integer, UINT64_MAX is + * returned. Note that abs marks the end of the first interval: if abs + * has not expired, zero intervals have elapsed. + */ +uint64_t +timespec_advance_nsec(struct timespec *abs, uint64_t intvl, + const struct timespec *now) +{ + struct timespec base, diff, minbase, next, intvl_product; + struct timespec intvl_product_max, intvl_ts; + uint64_t count = 0, quo; + + /* Unusual case: abs has not expired, no intervals have elapsed. */ + if (timespeccmp(now, abs, <)) { + if (intvl == 0) + panic("%s: intvl is zero", __func__); + return 0; + } + + /* Typical case: abs has expired and only one interval has elapsed. */ + NSEC_TO_TIMESPEC(intvl, &intvl_ts); + timespecadd(abs, &intvl_ts, &next); + if (timespeccmp(now, &next, <)) { + *abs = next; + return 1; + } + + /* + * Annoying case: two or more intervals have elapsed. + * + * Find a base within interval-product range of the current time. + * Under normal circumstances abs will already be within range, + * but for sake of correctness we handle cases where enormous + * expanses of time have passed between abs and now. + */ + quo = UINT64_MAX / intvl; + NSEC_TO_TIMESPEC(quo * intvl, &intvl_product_max); + timespecsub(now, &intvl_product_max, &minbase); + base = *abs; + if (__predict_false(timespeccmp(&base, &minbase, <))) { + while (timespeccmp(&base, &minbase, <)) { + timespecadd(&base, &intvl_product_max, &base); + u64_sat_add(&count, count, quo); + } + } + + /* + * We have a base within range. Now find the interval-product + * that, when added to the base, gets us just past the current time + * to the most imminent expiration point. + * + * If the product would overflow a 64-bit integer we advance the + * base by one interval and retry. This can happen at most once. + * + * The next expiration is then the sum of the base and the + * interval-product. + */ + for (;;) { + timespecsub(now, &base, &diff); + quo = TIMESPEC_TO_NSEC(&diff) / intvl; + if (__predict_true(intvl * quo <= UINT64_MAX - intvl)) + break; + timespecadd(&base, &intvl_ts, &base); + u64_sat_add(&count, count, quo); + } + NSEC_TO_TIMESPEC(intvl * (quo + 1), &intvl_product); + timespecadd(&base, &intvl_product, abs); + u64_sat_add(&count, count, quo + 1); + return count; +} + #ifndef SMALL_KERNEL void timeout_adjust_ticks(int adj) @@ -791,8 +960,8 @@ timeout_adjust_ticks(int adj) /* when moving a timeout forward need to reinsert it */ if (to->to_time - ticks < adj) to->to_time = new_ticks; - CIRCQ_REMOVE(&to->to_list); - CIRCQ_INSERT_TAIL(&timeout_todo, &to->to_list); + timeout_dequeue(to); + timeout_enqueue(&timeout_todo, to); } } ticks = new_ticks; @@ -824,6 +993,8 @@ db_kclock(int kclock) switch (kclock) { case KCLOCK_UPTIME: return "uptime"; + case KCLOCK_UTC: + return "utc"; default: return "invalid"; }