On Mon, 20 Jun 2011, Peter Jeremy wrote:
On 2011-Jun-18 22:05:06 +1000, Bruce Evans <b...@optusnet.com.au> wrote:
My clock measurement program (mostly an old program by Wollman) shows
the following histogram of times for a non-invariant TSC timecounter
on a 2GHz UP system:
% min 273, max 265102, mean 273.998217, std 79.069534
% 1th: 273 (1727219 observations)
% 2th: 274 (265607 observations)
% 3th: 275 (6984 observations)
% 4th: 280 (11 observations)
% 5th: 290 (8 observations)
The variance is small, and differences of a single nS can be seen clearly.
Unfortunately, Intel broke this in their P-state invariant TSC
implementation. Rather than incrementing the TSC by one at the
CPU core frequency, they increment by the core multiplier at the
FSB frequency. This gives a result like the following on my Atom
N270:
delta samples
24 49637124
36 50312540
48 44658
60 77
This makes it virtually impossible to measure short periods.
Luckily, AMD seem to have gotten this right.
I tested a FreeBSD cluster machine in userland, since it doesn't have a
usable TSC timecounter (iterating $(sysctl kern.timcounter...) is too
slow.
%%%
#include <sys/types.h>
#include <machine/cpufunc.h>
#include <stdio.h>
static unsigned buf[17];
static volatile unsigned v;
int
main(void)
{
int i;
for (i = 0; i < 17; i++)
buf[i] = rdtsc();
for (i = 0; i < 16; i++)
printf("%u\n", buf[i + 1] - buf[i]);
buf[0] = rdtsc();
for (i = 0; i < 1000000; i++)
v = rdtsc();
printf("%.1f\n", (v - buf[0]) / 1e6);
return (0);
}
%%%
Output:
77
63
63
70
63
63
63
70
63
63
70
63
63
63
70
63
65.2
%%%
It seems to always give a multiple of 7, so that might be the multiplier.
63 is also a lot, and limits the resulotion to ~34 nS at 1.86GHz.
On an original Athlon64:
%%%
34
8
5
8
5
8
5
8
5
8
5
8
5
8
5
8
6.5
%%%
Phenom specs say 42 instead of ~6.5 IIRC. Only slightly better than 63.
This is execution latencu, but although rdtsc is non-serialzied, there
is only 1 of it at least on old CPUs, so it can never deliver results
faster than its latency, on average. The 5's in the above seem to be
lower than the latency, due to the 8's being delivered late. I normally
write tests like the above in asm to get more control over the loop
overhead, but the above behaviour is interesting since it is what will
happen for normal unsynchronized use of rdtsc.
Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"