Glenn Fowler wrote: > > I think the point is that if the common usage is to > sum many different files, or one file at a time over > long spans of time then the performance of getting > the bytes from the filesystem to user space may > outweigh any cache optimization gains > > the ast apps are already at a disadvantage because they > pull in extra .so's over the base case(s) they are measured against > > what I need is a big view analysis of at least a few more variables > so that resonable decisions can be made of ifdef'ing up the code [snip] > what are the effects, if any, of timing apps repeatedly over the same file > vs > timing the apps over enough files to blow fs cache(s)
We use tmpfs (a ram disk) for benchmarking which AFAIK eliminates these effects. > what are the interactions between io/mmap block sizes and L? cache > block sizes being controlled by the prefetch calls? The size of the |mmap()| area is AFAIK completely independent from the L1 cache size and cache block sizes. The prefetch instructions we use via |sun_prefetch_read_many()| simply reduces the time the CPU pipeline waits (e.g. "hangs" (or in the case of Sun's Niagara machines they switch to a different hardware strand/thread)) waiting for the data to become available to it. > my suspicions are that tweaking the user io/mmap block sizes (which can be > done > in a general way for all apps, possibly with an ifdef in one place) > may change the timings and diminish the effects of the explicit prefetch calls I strongly doubt that since these things happen in completely different layers. The |sun_prefetch_read_many()| prefetch instruction requests data from main memory (if they are not cached yet) in _parallel_ to the normal pipeline work while the size of the |mmap()|'d area just defines how often we have to use |mmap()|/|munmap()| to walk over the whole file. > would it be enough to make them not worth it? AFAIK no - the usage of prefetch instructions will always give us a performance benefit since they reduce the waiting time of the CPU pipeline. Even on Niagara machines it's a benefit since the libsum code is single-threaded. > I don't know without more data > > also > are there performance results for the unhacked gnu sum vs the hacked gnu sum? I don't have the Sun patches for GNU "cksum" anymore (e.g. I am no longer at Sun) - that was the job of Tim Sparlin's team (but they abandoned the work in favor of the AST utilities anyway). > are there performance results for the hacked gnu sum vs the solaris sum? I don't have these data anymore... ;-( ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.ma...@nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org