:IIRC, Intel is using a very different caching method on the P4 from
:what we are used to on just about every other x86 processor we've
:seen. Well, I can't remember if the data cache has changed much, but
:the instruction cache has. I doubt the difference in instruction
:cache behaviour would make a difference here though. Hmm.
:
:I wonder if it makes any difference that I'm using -march=pentium
:-mcpu=pentium for my CFLAGS? Actually, the kernel I tested on might
:even be using -march/-mcpu=pentiumpro, since I only recently changed
:it to =pentium to allow me to do buildworlds for another Pentium-class
:machine. I did wonder the same thing a while back and did the same
:test with and without the optimizations, and with pentiumpro opts the
:big block size transfer rate went _down_ a little bit, which was odd.
:I didn't compare with L2-cache-friendly blocks, though.
:
:-- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED]
I modified my original C program again, this time to simply read
the data from memory given a block size in kilobytes as an argument.
I had to throw in a little __asm to do it right, but here are my results.
It shows about 3.2 GBytes/sec from the L2 (well, insofar as my
3-instruction loop goes), and about 1.4 GBytes/sec from main memory.
NOTE: cc x.c -O2 -o x
./x 4
3124.96 MBytes/sec (read)
./x 8
3242.45 MBytes/sec (read)
./x 16
3060.93 MBytes/sec (read)
./x 32
3359.97 MBytes/sec (read)
./x 64
3362.06 MBytes/sec (read)
./x 128
3365.53 MBytes/sec (read)
./x 240
3307.86 MBytes/sec (read)
./x 256
3232.33 MBytes/sec (read)
./x 512
1396.45 MBytes/sec (read)
./x 1024
1397.90 MBytes/sec (read)
In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2,
and 444 MBytes/sec from main memory.
-Matt
/*
* NOTE: cc x.c -O2 -o x
*/
#include <sys/types.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <unistd.h>
int deltausecs(struct timeval *tv1, struct timeval *tv2);
int
main(int ac, char **av)
{
int i;
int bytes;
double dtime;
struct timeval tv1;
struct timeval tv2;
char *buf;
if (ac == 1) {
fprintf(stderr, "%s numKB\n", av[0]);
exit(1);
}
bytes = strtol(av[1], NULL, 0) * 1024;
if (bytes < 4 * 1024 || bytes > 256 * 1024 * 1024) {
fprintf(stderr, "Oh please. Try a reasonable value\n");
exit(1);
}
buf = malloc(bytes);
if (buf == NULL) {
perror("malloc");
exit(1);
}
bzero(buf, bytes);
gettimeofday(&tv1, NULL);
for (i = 0; i < 1000000000; i += bytes) {
register int j;
for (j = bytes - 4; j >= 0; j -= 4)
__asm __volatile("movl (%0,%1),%%eax" :
"=r" (buf), "=r" (j) :
"0" (buf), "1" (j) : "ax" );
}
gettimeofday(&tv2, NULL);
dtime = (double)deltausecs(&tv1, &tv2);
printf("%6.2f MBytes/sec (read)\n", (double)1000000000 / dtime);
return(0);
}
int
deltausecs(struct timeval *tv1, struct timeval *tv2)
{
int usec;
usec = (tv2->tv_usec + 1000000 - tv1->tv_usec);
usec += (tv2->tv_sec - tv1->tv_sec - 1) * 1000000;
return(usec);
}
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message