On Mon, 2020-02-24 at 08:13 -0800, Richard Henderson wrote: > On 2/23/20 11:07 PM, Robert Hoo wrote: > > Inspired by your suggestion, I'm thinking go further: use immediate > > rather than a global variable, so that saves 1 memory(/cache) > > access. > > > > #ifdef CONFIG_AVX512F_OPT > > #define OPTIMIZE_LEN 256 > > #else > > #define OPTIMIZE_LEN 64 > > #endif > > With that, the testing in tests/test-bufferiszero.c, looping through > the > implementations, is invalidated. Because once you start compiling > for avx512, > you're no longer testing sse2 et al with the same inputs. > Right. Thanks pointing out. I didn't noticed that. More precisely, it would cause no longer testing sse2 et al with < 256 length.
> IF we want to change the length to suit avx512, we would want to > change it > unconditionally. And then you could also tidy up avx2 to avoid the > extra > comparisons there. Considering the length's dependency on sse2/sse4/avx2/avx512 and the algorithms, as well as future's possible changes, additions, I'd rather roll back to your original suggestion, use a companion variable with each accel_fn(). How do you like it? > > > r~