On Wed, 4 Jun 2014, Bryan Drewery wrote:
On 6/3/2014 4:06 PM, Alexander Motin wrote:
Author: mav
Date: Tue Jun 3 21:06:03 2014
New Revision: 267029
URL: http://svnweb.freebsd.org/changeset/base/267029
Log:
Replace gethrtime() with cpu_ticks(), as source of random for the taskqueue
selection. gethrtime() in our port updated with HZ rate, so unusable for
this specific purpose, completely draining benefit of multiple taskqueues.
MFC after: 2 weeks
Modified:
head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c Tue Jun 3
21:02:19 2014 (r267028)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c Tue Jun 3
21:06:03 2014 (r267029)
@@ -953,7 +953,7 @@ spa_taskq_dispatch_ent(spa_t *spa, zio_t
if (tqs->stqs_count == 1) {
tq = tqs->stqs_taskq[0];
} else {
- tq = tqs->stqs_taskq[gethrtime() % tqs->stqs_count];
+ tq = tqs->stqs_taskq[cpu_ticks() % tqs->stqs_count];
}
taskq_dispatch_ent(tq, func, arg, flags, ent);
Bugs remaining in this include:
- the comment before this function still says that gethrtime() is used:
/*
* Dispatch a task to the appropriate taskq for the ZFS I/O type and priority.
* Note that a type may have multiple discrete taskqs to avoid lock contention
* on the taskq itself. In that case we choose which taskq at random by using
* the low bits of gethrtime().
*/
- timing functions are a bogus source of random bits. The get_cyclecount()
function should be used for this, though its existence is a bug and its
name and implementation are worse.
On some arches (actually just i386, but should be arm too), cputick()
is the same as get_cyclecount(). cputick() has to be correct for
timing, but get_cyclecount() doesn't, so get_cyclecount() can be more
optimized, but on some arches get_cyclecount() is correct for timing
too. The implementation of this varies gratuitously with the arch.
Only arm is very bad:
- arm: get_cyclecount() is binuptime() with bogus scaling. This shows
another bug in the above change -- depending on the details of the
ticker and its scaling, there may be no random low bits at all.
stqs_count is dynamically initialized and it is not clear if it is
a power of 2, or a prime... The bogus scaling is:
binuptime(&bt);
return ((uint64_t)bt.sec << 56 | bt.frac >> 8);
The 8 lower bits may or may not be random (or even nonzero) but they are
always discarded. The remaining lower bits may or may not be random
(or even zero). 56 top bits of the seconds part are always discarded.
Old versions were better. They xor'ed the seconds and fraction part.
This is inconsistent with the bad function name, but better for
randomness.
It is still an error to use only the low bits. Using all the bits modulo
a prime would work well:
tq = tqs->stqs_taskq[cpu_ticks() % BIGPRIME % tqs->stqs_count];
The magic BIGPRIME only needs to be relatively prime to the infinite-
precision maxof(typeof(get_cyclecount()) + 1), so it doesn't depend
on the details of get_cyclecount(). For general use, BIGPRIME should
be 64 bits, but 32 bits should be enough for anyone and the the
64-bit division for excessive generality would be slow. The existing
64-bit division is already excessively general and gratuitously slow.
Both cpu_ticks() and gethrtime() are 64 bits. The division should have
been reduced to 32 bits using (uint32_t)getmumbletime() % count.
When ntpd is active, the lower bits in the fraction normally contain
significant randomness from micro-adjustments, so they should not be
discarded. Otherwise, there is really no randomness in lower bits,
but they may change in step with high bits depending on the details
of the clock frequency and scaling.
- i386: get_cyclecount() is cputicks(). This is different from amd64
because rdtsc() is not always available and cputicks() handles the
details of sometimes using a timecounter. This is unnecessarily
slow when cputicks() doesn't use rdtsc(). cputicks() isn't inline,
and it uses lots of function pointers. It even uses a function
pointer for rdtsc(). cputicks() is much more important than
get_cyclecount(). It should be optimized. arm should use it even
if it is not optimized.
get_cyclecount() is abused as a cycle count in:
- netinet/sctp_os_bsd.h. microtime() with an x86 TSC timecounter is
plenty fast and accurate enough for timestamping 1 GBps ethernet.
(I sometimes use nanotime(), but find that actually using the
nanoseconds resolution usually just makes the output too verbose
with noise in the last 3 decimal digits, so the output formatting
routine needs to reduce to microseconds.)
- kern/ktr_time.c. The undocumented unsupported option KTR_TIME defaults
to being get_cyclecount(). KTR needs to timestamp low-level functions,
so the timestamp function needs to be lower-level and nanotime() might
not be low enough. However, i386 sometimes uses the timecounter,
and arm always does, to nanotime() must be low enough. nanotime() also
gives nicely biased and scaled timestamps. KTR has null support for
scaling the timestamps that it makes. In the kernel, it just prints
raw values in %10.10lld format. ktrdump(1) is further from knowing how
to scale them. It just prints them or differences of them in %16ju
format. 16 is excessively wide, yet not wide enough to print
UINT64_MAX (that takes 20 decimal digits. Only 16 hex digits). At
least it doesn't use the long long abomination.
- kern/subr_bus.c. get_cyclecount() is not abused, but its result is
assigned to a variable named attachtime. This is fed to random_harvest()
as the vatiable named *entropy, so it is clearly random and not a time.
The existence of get_cyclecount() is a bug since all uses are satisifed
with a timestamp and it is better to optimize the timestamp functions
than have special versions of them for just randomness. It was created
before cputick() existed. cputick() is a fast enough timestamp function
for all random uses (it basically uses a cycle counter if available,
else falls back to using the timecounter).
However, bugs pointed out above show that it is confusing to use
functions whose names are related to timers as a source of randomness.
A name with "random" in it would be better. Also, some arches may
have a random register (in the CPU so it is fast to read) but no
x86-style cycle counter or fast timecounter. These arches should use
the random register for the random function. The random register just
doesn't need to be very wide or crypto quality.
Some of the bugs are documented in get_cyclecount(9). It is documented
as monotonically increasing, although the xor version on arm was far
from that and the current version wraps every 256 seconds.
Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"