Am 22.03.2013 um 22:24 schrieb Paolo Bonzini <pbonz...@redhat.com>: > Il 22/03/2013 20:20, Peter Lieven ha scritto: >>> I think patch 4 is a bit overengineered. I would prefer the simple >>> patch you had using three/four non-vectorized accesses. The setup cost >>> of the vectorized buffer_is_zero is quite high, and 64 bits are just >>> 256k RAM; if the host doesn't touch 256k RAM, it will incur the overhead. >> I think you are right. I was a little to eager to utilize >> buffer_find_nonzero_offset() >> as much as possible. The performance gain by unrolling was impressive enough. >> The gain by the vector functions is not that big that it would justify a >> possible >> slow down by the high setup costs. My testings revealed that in most cases >> buffer_find_nonzero_offset() >> returns 0 or a big offset. All the 0 return values would have increased >> setup costs with >> the vectorized version of patch 4. >> >>> >>> I would prefer some more benchmarking for patch 5, but it looks ok. >> What would you like to see? Statistics how many pages of a real system >> are not zero, but zero in the first sizeof(long) bytes? > > Yeah, more or less. Running the system for a while, migrating, and > plotting a histogram of the return values of buffer_find_nonzero_offset > (hmm, perhaps using a nonvectorized version is better for this experiment).
I will follow up with this on Monday. Have you seen my concern, that the whole page is read anyway if it is non-zero? Peter > > Paolo