Il 22/03/2013 20:20, Peter Lieven ha scritto: >> I think patch 4 is a bit overengineered. I would prefer the simple >> patch you had using three/four non-vectorized accesses. The setup cost >> of the vectorized buffer_is_zero is quite high, and 64 bits are just >> 256k RAM; if the host doesn't touch 256k RAM, it will incur the overhead. > I think you are right. I was a little to eager to utilize > buffer_find_nonzero_offset() > as much as possible. The performance gain by unrolling was impressive enough. > The gain by the vector functions is not that big that it would justify a > possible > slow down by the high setup costs. My testings revealed that in most cases > buffer_find_nonzero_offset() > returns 0 or a big offset. All the 0 return values would have increased setup > costs with > the vectorized version of patch 4. > >> >> I would prefer some more benchmarking for patch 5, but it looks ok. > What would you like to see? Statistics how many pages of a real system > are not zero, but zero in the first sizeof(long) bytes?
Yeah, more or less. Running the system for a while, migrating, and plotting a histogram of the return values of buffer_find_nonzero_offset (hmm, perhaps using a nonvectorized version is better for this experiment). Paolo