Re: Swapping performance

Terry Lambert Fri, 08 Mar 2002 01:45:50 -0800

David Xu wrote:
> I have done some tests on my machine,  the machine has both
> Linux and FreeBSD installed, the following is the data:
> 
> MALLOC_SIZE = 1024*1024*400
> has bzero
> 
> Red Linux 6.2(kernel 2.2.14)
>     5.09u 5.62s 1:15.33 14% 
>     4.70u 5.73s 1:17.13 13%
>     4.88u 5.68s 1:17.04 13%
> 
> FreeBSD 4.5-STABLE
>     5.489u 6.815s 1:25.96 14.2%   4+425738k 0+0io 12937pf+0w
>     5.342u 6.728s 1:24.40 14.2%   4+414152k 0+0io 12929pf+0w
>     5.073u 6.815s 1:28.58 13.4%   3+408011k 1+0io 12920pf+0w


OK.

> MALLOC_SIZE = 1024*1024*400
> no bzero
> 
> Red Linux 6.2(kernel 2.2.14)
>     2.01u 4.16s 0:24.79 24%
>     1.82u 4.31s 0:24.90 24%
>     1.76u 4.29s 0:24.51 24%
> 
> FreeBSD 4.5-STABLE
>     2.802u 3.604s 0:23.20 27.5%   4+415497k 0+0io 81pf+0w
>     2.975u 3.434s 0:23.58 27.1%   4+412937k 0+0io 83pf+0w
>     2.871u 3.480s 0:23.91 26.5%   4+413607k 0+0io 83pf+0w

I expected this.  The bzero() has two effects:

1)      It presets the LRU list to inverse sequential
        order
2)      It faults the pages in order, with a byte-by-byte
        write through

So when we take the bzero out, the FreeBSD version is faster.

I think if you were to mmap() and madvise() sequential before
the bzero, and madvise random after, that the FreeBSD would
end up faster even with the bzero present.

Even so, the earlier observations on the slopes of the
page selection curves and the domains, mean that the
access pattern is pessimal.

Actually, given the LRU ordering as a result of the bzero,
I would expect that if you reversed the access pattern,
the FreeBSD case would be significantly faster, since it
would hit cache for all of the pages until it hit the first
swap backed page.

You could simulate this by doing a touch backwards, replacing
the single bzero:

        /* reverse LRU list */
        i_count = MALLOC_SIZE/4096;     /* pages*/
        for( i = i_count - 1; i >= 0; i--)
                bzero(&ptr[ i<<12], 4096);

In other words, my observations on page access and Matt's
observations on what was being measured are overall correct.

The mmap() would also allow you to make FreeBSD do what Linux
does; I'm not certain that this would work with the malloc'ed
pages, though it should (PHK would be a better judge).

Basically, replace the bzero() with:

        madvise(ptr, MALLOC_SIZE, MADV_SEQUENTIAL|MADV_NOSYNC);
        bzero(ptr, MALLOC_SIZE);
        madvise(ptr, MALLOC_SIZE, MADV_RANDOM|MADV_NOSYNC);

If you invert the LRU list, I would also suggest:

        madvise(ptr, MALLOC_SIZE, MADV_RANDOM|MADV_NOSYNC|MADV_WILLNEED);
        
Though WILLNEED and RANDOM may not get along completely.  8-(.
I would say "WILLNEED is more important, since it keeps the PTEs
in place.

The NOSYNC turns off the explicit write-through of pages;
this is most likely the slowdown relative to Linux, since
Linux maintains explicit coherency between the VM and
buffer cache, instead of implicit, so the "write through"
case in the Linux case is not the default; this is probably
the mail thing that is making the bzero case slower on
FreeBSD vs. Linux.


> Unfortunately, I havn't Linux kernel 2.4.17 installed,  is
> Linux kernel 2.4.17 faster?

Higher version numbers usually mean more speed.  Unfortunately,
there are two main Linux VM systems these days, and which one
you get really depends on *both* the kernel version and which
distribution you get.

We need to be very clear here on which Linux is being tested,
since just saying "Linux" is no longer enough, now that they
have forked their kernel along several important axis.  VM
speed, in particular, is going to vary widely by load and
vendor.

Oh.  Ugh.  Almost forgot.  I think very large mappings in
Linux get 4M pages.  This takes them ot of the TLB collision
domain with ordinary pages (most Pentium class CPUs have 16
4k data page TLBs, 16 4k code page TLBs, and 8 4M page TLBs
that are seperate.

TLB thrashing can account for as much as 14% of performance,
which is one of the reasons we went to 4M pages for all the
mbufs at ClickArray last year.  Doing the mapping off of
/dev/zero *should* get the 4M page mappings on FreeBSD...
again, malloc may take care of this already, though I think
it doesn't necessarily mmap off of /dev/zero, but we *are*
talking some very large allocations here... PHK is the guy
to ask on that, again.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Swapping performance

Reply via email to