On Thu, 7 Nov 2024 at 00:40, Bertrand Drouvot <bertranddrouvot...@gmail.com> wrote: > Do you mean add: > > " > for (; p < aligned_end; p += sizeof(size_t)) > { > if (*(size_t *)p != 0) > return false; > } > " > > just before the last loop? > > If so, I did a few tests and did not see any major improvements. So, I thought > it's simpler to not add more code in this inline function in v7 shared > up-thread.
Did you try with a size where there's a decent remainder, say 124 bytes? FWIW, one of the cases has 112 bytes, and I think that is aligned memory meaning we'll do the first 64 in the SIMD loop and have to do 48 bytes in the byte-at-a-time loop. If you had the loop Michael mentioned, that would instead be 6 loops of size_t-at-a-time. David