David Xu wrote: > I have done some tests on my machine, the machine has both > Linux and FreeBSD installed, the following is the data: > > MALLOC_SIZE = 1024*1024*400 > has bzero > > Red Linux 6.2(kernel 2.2.14) > 5.09u 5.62s 1:15.33 14% > 4.70u 5.73s 1:17.13 13% > 4.88u 5.68s 1:17.04 13% > > FreeBSD 4.5-STABLE > 5.489u 6.815s 1:25.96 14.2% 4+425738k 0+0io 12937pf+0w > 5.342u 6.728s 1:24.40 14.2% 4+414152k 0+0io 12929pf+0w > 5.073u 6.815s 1:28.58 13.4% 3+408011k 1+0io 12920pf+0w
OK. > MALLOC_SIZE = 1024*1024*400 > no bzero > > Red Linux 6.2(kernel 2.2.14) > 2.01u 4.16s 0:24.79 24% > 1.82u 4.31s 0:24.90 24% > 1.76u 4.29s 0:24.51 24% > > FreeBSD 4.5-STABLE > 2.802u 3.604s 0:23.20 27.5% 4+415497k 0+0io 81pf+0w > 2.975u 3.434s 0:23.58 27.1% 4+412937k 0+0io 83pf+0w > 2.871u 3.480s 0:23.91 26.5% 4+413607k 0+0io 83pf+0w I expected this. The bzero() has two effects: 1) It presets the LRU list to inverse sequential order 2) It faults the pages in order, with a byte-by-byte write through So when we take the bzero out, the FreeBSD version is faster. I think if you were to mmap() and madvise() sequential before the bzero, and madvise random after, that the FreeBSD would end up faster even with the bzero present. Even so, the earlier observations on the slopes of the page selection curves and the domains, mean that the access pattern is pessimal. Actually, given the LRU ordering as a result of the bzero, I would expect that if you reversed the access pattern, the FreeBSD case would be significantly faster, since it would hit cache for all of the pages until it hit the first swap backed page. You could simulate this by doing a touch backwards, replacing the single bzero: /* reverse LRU list */ i_count = MALLOC_SIZE/4096; /* pages*/ for( i = i_count - 1; i >= 0; i--) bzero(&ptr[ i<<12], 4096); In other words, my observations on page access and Matt's observations on what was being measured are overall correct. The mmap() would also allow you to make FreeBSD do what Linux does; I'm not certain that this would work with the malloc'ed pages, though it should (PHK would be a better judge). Basically, replace the bzero() with: madvise(ptr, MALLOC_SIZE, MADV_SEQUENTIAL|MADV_NOSYNC); bzero(ptr, MALLOC_SIZE); madvise(ptr, MALLOC_SIZE, MADV_RANDOM|MADV_NOSYNC); If you invert the LRU list, I would also suggest: madvise(ptr, MALLOC_SIZE, MADV_RANDOM|MADV_NOSYNC|MADV_WILLNEED); Though WILLNEED and RANDOM may not get along completely. 8-(. I would say "WILLNEED is more important, since it keeps the PTEs in place. The NOSYNC turns off the explicit write-through of pages; this is most likely the slowdown relative to Linux, since Linux maintains explicit coherency between the VM and buffer cache, instead of implicit, so the "write through" case in the Linux case is not the default; this is probably the mail thing that is making the bzero case slower on FreeBSD vs. Linux. > Unfortunately, I havn't Linux kernel 2.4.17 installed, is > Linux kernel 2.4.17 faster? Higher version numbers usually mean more speed. Unfortunately, there are two main Linux VM systems these days, and which one you get really depends on *both* the kernel version and which distribution you get. We need to be very clear here on which Linux is being tested, since just saying "Linux" is no longer enough, now that they have forked their kernel along several important axis. VM speed, in particular, is going to vary widely by load and vendor. Oh. Ugh. Almost forgot. I think very large mappings in Linux get 4M pages. This takes them ot of the TLB collision domain with ordinary pages (most Pentium class CPUs have 16 4k data page TLBs, 16 4k code page TLBs, and 8 4M page TLBs that are seperate. TLB thrashing can account for as much as 14% of performance, which is one of the reasons we went to 4M pages for all the mbufs at ClickArray last year. Doing the mapping off of /dev/zero *should* get the 4M page mappings on FreeBSD... again, malloc may take care of this already, though I think it doesn't necessarily mmap off of /dev/zero, but we *are* talking some very large allocations here... PHK is the guy to ask on that, again. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message