On Mon, 25 Nov 2002, Luigi Rizzo wrote: > I just got hit by a peculiar problem related to out-of-order > execution of instructions. > I was doing some low-level timing measurements using the rdtsc() > around selected pieces of code (the rdtsc() is included in > the TSTMP() functions that are in RELENG_4, source is in > sys/i386/isa/clock.c), as follows: > > TSTMP(3, ifp->if_unit, 1, 0); > tmp = CSR_READ_1(sc, FXP_CSR_SCB_STATACK); > TSTMP(3, ifp->if_unit, 2, 0); > TSTMP(3, ifp->if_unit, 3, 0); > > CSR_READ_1() goes to do a volatile read on memory across a 33MHz > PCI bus, so it should take a very minimum of 100ns, plus arbitration > and bridge crossing and whatnot. To my surprise, on a 750MHz Athlon > box, the delta between the first two timestamps turned out to be > in the order of 39 clock cycles, whereas the delta between 2 and 3 > is the 270-300 cycles range. > > The only explaination i can find is that the rdtsc() within TSTMP() > is executed out of order. > > I wonder, is there on the high-end i386 processors any 'barrier' > instruction of some kind that enforces in-order execution of some > piece of code ?
The Intel processor manual has an explicit example for this and recommends you use cpuid as a serializing instruction before the call to rdtsc. Basically you call cpuid + rdtsc a bunch of times to calibrate its average latency. Then do your run with cpuid + rdtsc to get the beginning and end clockstamp, subtract the two plus the latency you calculated above. This gives a good value for the cycles in your routine. Other factors like acpi can affect rdtsc so beware of this. -Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message