On Sat, May 05, 2001 at 06:26:30PM +0200, Rogier Wolff wrote:
> 
> As all this is trying to avoid bus turnarounds (i.e. switching from
> reading to writing), wouldn't it be fastest to just trust that the CPU
> has at least 4k worth of cache? (and hope for the best that we don't
> get interrupted in the meanwhile).
> 
> void copy_page (char *dest, char *source)
> {
>       long *dst = (long *)dest, 
>               *src=(long *)source, 
>               *end= (long *)(source+PAGE_SIZE);
> #if 1
>       register int  i;
>       long t=0;
>       static long tt;
> 
>       for (i=0;i<PAGE_SIZE/sizeof (long);i += cache_line_size()/sizeof(long))
>       /* Actually the innards of this loop should be:
>               (void) from[i];
>          however, the compiler will probably optimize that away. */ 
>               t += src[i];
> 
>       tt = t;
> #endif
>       while (src < end)
>               *dst++ = *src++;
> 
> }
> 
> So, this is 15 lines of C, and it'd be interesting to benchmark this
> against the assembly.
> 
> I'm assuming that the "loop variable handling" is not going to
> influence the overall performance: that would run at 500 - 1000MHz,
> and around 1 clock cycle (1-2ns) per loop. Set this against the stalls
> against the memory unit whose output buffer is full, and memory writes
> that take on the order of 30 ns per 64bits.

Can't you use volatile to prevent the compiler from optimizing
it?


Kurt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to