On Mon, Nov 19, 2018 at 9:05 AM Bruce Evans <b...@optusnet.com.au> wrote:
> On Tue, 20 Nov 2018, Bruce Evans wrote: > > > On Tue, 20 Nov 2018, Bruce Evans wrote: > > > >> On Mon, 19 Nov 2018, Warner Losh wrote: > >> > >>> On Mon, Nov 19, 2018 at 12:31 AM Andriy Gapon <a...@freebsd.org> wrote: > > ... > > I found my test program. > > > >>> But I think I understand the problem now. > >>> > >>> mstosbt(1000) is overflowing with my change, but not without because > we're > >>> adding 2^32 to a number that's ~900 away from overflowing changing a > very > >> > >> This value is just invalid. Negative values are obviously invalid. > >> Positive > >> values of more than 1 second are invalid. In the old version, values of > >> precisely 1 second don't overflow since the scale factor is rounded > down; > >> the result is just 1 unit lower than 1 second. Overflow (actually > >> truncation) > >> occurs at 1 unit above 1 second. E.g., sbttoms(mstosbt(1000)) was 999 > and > >> sbttoms(mstosbt(1001)) was 0. Now in my fixed version, > >> sbttoms(mstosbt(1000)) > >> is truncated to 0 and sbttoms(mstosbt(1001)) is truncated to 1. > >> > >> The test program also showed that in the old version, all valid args > except > >> 0 are reduced by precisely 1 by conversion to sbt and back. > >> > >>> large sleep of 1 second to a tiny sleep of 0 (which is 1ms at the > default > >>> Hz). I think we get this in the mlx code because there's a > msleep(1000) in > >>> at least one place. msleep(1001) would have failed before my change. > Now I > >>> think msleep(999) works, but msleep(1000) fails. Since the code is > waiting > >>> a second for hardware to initialize, a 1ms instead is likely to catch > the > >>> hardware before it's finished. I think this will cause problems no > matter > >>> cold or not, since it's all pause_sbt() under the covers and a delay > of 0 > >>> is still 0 either way. > >> > >> Bug in the mlx code. The seconds part must be converted separately, as > in > >> tstosbt(). > > > > mlx doesn't seem to use sbt directly, or even msleep(), so the bug does > > seem to be central. > > > > mlx actually uses mtx_sleep() with timo = hz(). mtx_sleep() is actually > > an obfuscated macro wrapping _sleep(). The conversion to sbt is done by > > the macro and is sloppy. It multiplies timo by tick_sbt. > > > > tick_sbt is SBT_1S / hz, so the sbt is SBT_1S / hz * hz which is SBT_1S > > reduced a little. This is not affected by the new code, and it still > isn't > > converted back to 1 second in ms, us or ns. Even if it is converted back > > and then forth to sbt using the new code, it remains less than 1 second > as > > an sbt so shouldn't cause any new problems. > > Here are all uses of these functions in kern outside of sys/time.h: > > XX arm/arm/mpcore_timer.c: sc->et.et_min_period = nstosbt(20); > > OK since 20 ns is less than 1 second. > > XX cddl/compat/opensolaris/sys/kcondvar.h: return > (cv_timedwait_sbt(cvp, mp, nstosbt(tim), nstosbt(res), 0)); > XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: > pause_sbt("dmu_tx_delay", nstosbt(wakeup), > XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: > nstosbt(zfs_delay_resolution_ns), C_ABSOLUTE); > XX cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c: sbintime_t sleep = > nstosbt((zilog->zl_last_lwb_latency * pct) / 100); > XX compat/linuxkpi/common/include/linux/delay.h: > pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK); > XX compat/linuxkpi/common/src/linux_hrtimer.c: > nstosbt(hrtimer->expires), nstosbt(hrtimer->precision), 0); > XX compat/linuxkpi/common/src/linux_hrtimer.c: > callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(time)), > XX compat/linuxkpi/common/src/linux_hrtimer.c: nstosbt(nsec), > hrtimer_call_handler, hrtimer, 0); > XX compat/linuxkpi/common/src/linux_hrtimer.c: > callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(interval)), > XX compat/linuxkpi/common/src/linux_hrtimer.c: > nstosbt(hrtimer->precision), hrtimer_call_handler, hrtimer, 0); > XX compat/linuxkpi/common/src/linux_schedule.c: ret = > -pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK | C_CATCH); > > All of the above might are broken unless their timeout arg is restricted to > less than 1 second. > > Also, at least the sleep and pause calls in the above probably have a > style bug in knowing about sbt's at all. More important functions > like msleep() hide the use of sbt's and use fuzzier scale factors like > tick_sbt. > Yes. It's for these users I'm fixing the >= 1s cases. > XX dev/iicbus/nxprtc.c: pause_sbt("nxpotp", mstosbt(100), > mstosbt(10), 0); > XX dev/iicbus/nxprtc.c: pause_sbt("nxpbat", mstosbt(100), 0, 0); > > OK since 100 ms is less than 1 second. > > XX kern/kern_sysctl.c: sb = ustosbt(tt); > XX kern/kern_sysctl.c: sb = mstosbt(tt); > > Recently added bugs. The args is supplied by applications so can be far > above > 1 second. > > Also, these functions have a bogus API that takes uint64_t args. sbt's > can't represent that high, and the API doesn't support failure, so callers > have the burden of passing valid values. It is easiest to only support > args > up to 1 second and require the caller to handle seconds. > It's just as easy to cope. > Lots of itimer and select() code handle the corresponding problem for > timevals and timespecs almost correctly. Timevals and timespecs already > have the seconds part separate and itimer and select() syscalls do validity > checking on the fractional seconds, so there is no problem converting the > fractional sections to an sbt. Howver, the seconds part has type time_t > and this can be int64_t, which sbt's cannot represent. Also, POSIX doesn't > permit failure for seconds values that are representable by time_t. The > almost correct handling is to split up large seconds values in some cases > and break POSIX conformance by silently reducing the seconds value in > other cases. The reduction is usually to INT32_MAX / 2. This allows > adding 2 limited values but it is not obvious that this is enough since > rounding up twice or just adding 2 more would give overflow. > Right, that's beyond the scope of what I'm fixing. > XX kern/subr_rtc.c: sbt = nstosbt(waitns); > > This is correct, since waitns is the nanoseconds part of a timespec. > Yup. https://reviews.freebsd.org/D18051 Contains the changes. They address all the actual problems you've raised, but I'm going to disagree that 'ull' is bogus. For FreeBSD, it's fine and it's a lot more readable than the alternatives. Warner _______________________________________________ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"