> Maybe I should have explained the output more detailed. The percentages > are added. 35.8% in the second last column means that > 35.8% have a return value that is less than TARGET_PAGE_SIZE. > This was meant to illustrate at how many 64-bit chunks you have > to look to grab a certain percentage of non-zero pages.
Ok, I wrongly understood that many pages had 4088 zero bytes but the last 8 were not zero. Now it's clearer, and more logical too. :) > Looking e.g. at the third value it means that looking at the first > three 64-bit chunks it will catch 34.0% of all pages. > It turns out that the non-zeroness of a page can be detected looking > at the first 256 or so bits and only a low > percentage turns out to be non-zero at a later position. So after > having checked the first chunks one by one > there is no big penalty looking at the remaining chunks with the > vectorized loop. I think it makes most sense to unroll the first four non-vectorized iterations, i.e. not use SSE and use three or four ifs. Either: if (foo[0]) return 0; if (foo[1]) return 8; if (foo[2]) return 16; if (foo[3]) return 24; or if (foo[0]) return 0; if (foo[1] | foo[2] | foo[3]) return 8; and then proceed on the remaining 4096-4*sizeof(long) bytes with the vectorized loop. foo+4 is aligned for SIMD operations on both 32- and 64-bit machines, which makes this a nice choice. Paolo > Here is the distribution of return values for the Windows XP example: > > 25.62% 0.49% 7.86% 0.12% 0.15% 0.05% 0.05% 0.04% 0.05% 0.02% 0.03% > 0.02% 0.03% 0.02% 0.02% 0.01% 0.03% 0.02% 0.01% 0.02% 0.02% 0.01% > 0.02% 0.01% 0.01% 0.01% 0.01% 0.01% 0.02% 0.00% 0.01% 0.02% 0.03% > 0.01% 0.01% 0.01% 0.01% 0.01% 0.01% 0.00% 0.00% 0.01% 0.07% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.01% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.01% 0.02% 0.00% 0.00% 0.00% > 0.00% 0.01% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.02% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.03% 0.00% 0.00% 0.00% > 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.02% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.01% > 0.00% 0.00% 0.00% 0.02% 0.00% 0.00% 0.02% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.01% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% > 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% > 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 64.23% > > The last value is the percentage of return value of TARGET_PAGE_SIZE > meaning the page is all zero. > > Peter > >