Glenn Fowler wrote:
> 
> I think the point is that if the common usage is to
> sum many different files, or one file at a time over
> long spans of time then the performance of getting
> the bytes from the filesystem to user space may
> outweigh any cache optimization gains
> 
> the ast apps are already at a disadvantage because they
> pull in extra .so's over the base case(s) they are measured against
> 
> what I need is a big view analysis of at least a few more variables
> so that resonable decisions can be made of ifdef'ing up the code
[snip]
> what are the effects, if any, of timing apps repeatedly over the same file
> vs
> timing the apps over enough files to blow fs cache(s)

We use tmpfs (a ram disk) for benchmarking which AFAIK eliminates these
effects.

> what are the interactions between io/mmap block sizes and L? cache
> block sizes being controlled by the prefetch calls?

The size of the |mmap()| area is AFAIK completely independent from the
L1 cache size and cache block sizes. The prefetch instructions we use
via |sun_prefetch_read_many()| simply reduces the time the CPU pipeline
waits (e.g. "hangs" (or in the case of Sun's Niagara machines they
switch to a different hardware strand/thread)) waiting for the data to
become available to it.

> my suspicions are that tweaking the user io/mmap block sizes (which can be 
> done
> in a general way for all apps, possibly with an ifdef in one place)
> may change the timings and diminish the effects of the explicit prefetch calls

I strongly doubt that since these things happen in completely different
layers. The |sun_prefetch_read_many()| prefetch instruction requests
data from main memory (if they are not cached yet) in _parallel_ to the
normal pipeline work while the size of the |mmap()|'d area just defines
how often we have to use |mmap()|/|munmap()| to walk over the whole
file.

> would it be enough to make them not worth it?

AFAIK no - the usage of prefetch instructions will always give us a
performance benefit since they reduce the waiting time of the CPU
pipeline. Even on Niagara machines it's a benefit since the libsum code
is single-threaded.

> I don't know without more data
> 
> also
> are there performance results for the unhacked gnu sum vs the hacked gnu sum?

I don't have the Sun patches for GNU "cksum" anymore (e.g. I am no
longer at Sun) - that was the job of Tim Sparlin's team (but they
abandoned the work in favor of the AST utilities anyway).

> are there performance results for the hacked gnu sum vs the solaris sum?

I don't have these data anymore... ;-(

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.ma...@nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to