:IIRC, Intel is using a very different caching method on the P4 from
:what we are used to on just about every other x86 processor we've
:seen.  Well, I can't remember if the data cache has changed much, but
:the instruction cache has.  I doubt the difference in instruction
:cache behaviour would make a difference here though.  Hmm.
:
:I wonder if it makes any difference that I'm using -march=pentium
:-mcpu=pentium for my CFLAGS?  Actually, the kernel I tested on might
:even be using -march/-mcpu=pentiumpro, since I only recently changed
:it to =pentium to allow me to do buildworlds for another Pentium-class
:machine.  I did wonder the same thing a while back and did the same
:test with and without the optimizations, and with pentiumpro opts the
:big block size transfer rate went _down_ a little bit, which was odd.
:I didn't compare with L2-cache-friendly blocks, though.
:
:-- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED]

    I modified my original C program again, this time to simply read
    the data from memory given a block size in kilobytes as an argument.  
    I had to throw in a little __asm to do it right, but here are my results.
    It shows about 3.2 GBytes/sec from the L2 (well, insofar as my
    3-instruction loop goes), and about 1.4 GBytes/sec from main memory.


NOTE:  cc x.c -O2 -o x

./x 4
3124.96 MBytes/sec (read)

./x 8
3242.45 MBytes/sec (read)

./x 16
3060.93 MBytes/sec (read)

./x 32
3359.97 MBytes/sec (read)

./x 64
3362.06 MBytes/sec (read)

./x 128
3365.53 MBytes/sec (read)

./x 240
3307.86 MBytes/sec (read)

./x 256
3232.33 MBytes/sec (read)

./x 512
1396.45 MBytes/sec (read)

./x 1024
1397.90 MBytes/sec (read)

    In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2,
    and 444 MBytes/sec from main memory.

                                        -Matt

/*
 * NOTE:  cc x.c -O2 -o x
 */

#include <sys/types.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <unistd.h>

int deltausecs(struct timeval *tv1, struct timeval *tv2);

int
main(int ac, char **av)
{
    int i;
    int bytes;
    double dtime;
    struct timeval tv1;
    struct timeval tv2;
    char *buf;

    if (ac == 1) {
        fprintf(stderr, "%s numKB\n", av[0]);
        exit(1);
    }
    bytes = strtol(av[1], NULL, 0) * 1024;
    if (bytes < 4 * 1024 || bytes > 256 * 1024 * 1024) {
        fprintf(stderr, "Oh please.  Try a reasonable value\n");
        exit(1);
    }
    buf = malloc(bytes);
    if (buf == NULL) {
        perror("malloc");
        exit(1);
    }
    bzero(buf, bytes);

    gettimeofday(&tv1, NULL);
    for (i = 0; i < 1000000000; i += bytes) {
        register int j;

        for (j = bytes - 4; j >= 0; j -= 4)
            __asm __volatile("movl (%0,%1),%%eax" : 
                "=r" (buf), "=r" (j) :
                "0" (buf), "1" (j) : "ax" );
    }
    gettimeofday(&tv2, NULL);

    dtime = (double)deltausecs(&tv1, &tv2);
    printf("%6.2f MBytes/sec (read)\n", (double)1000000000 / dtime);
    return(0);
}

int
deltausecs(struct timeval *tv1, struct timeval *tv2)
{
    int usec;

    usec = (tv2->tv_usec + 1000000 - tv1->tv_usec);
    usec += (tv2->tv_sec - tv1->tv_sec - 1) * 1000000;
    return(usec);
}


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to