On Wed, Nov 2, 2016 at 10:26 PM, Herbert Xu <herb...@gondor.apana.org.au> wrote: > What I'm interested in is whether the new code is sufficiently > close in performance to the old code, particularonly on x86. > > I'd much rather only have a single set of code for all architectures. > After all, this is meant to be a generic implementation.
Just tested. I get a 6% slowdown on my Skylake. No good. I think it's probably best to have the two paths in there, and not reduce it to one.