On Tue, Jun 20, 2017 at 2:20 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Tue, Jun 20, 2017 at 2:17 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >> On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer <fwei...@redhat.com> wrote: >>> On 06/20/2017 01:10 PM, Uros Bizjak wrote: >>> >>>> 74,99% a.out a.out [.] test_or >>>> 12,50% a.out a.out [.] test_movb >>>> 12,50% a.out a.out [.] test_movl >>> >>> Could you try notl/notb/negl/negb as well, please? >> >> These all have the same (long) runtime as test_or. > > Perhaps we can use "testb $0, %0"? It doesn't write to the memory, but > otherwise has the same runtime as movb/movl.
That sounds good, OTOH it's a matter of putting strain on the memory fetch or store side... We'll get cacheline allocations in any case (but the memory will be used eventually). Instead of test a mere movb into a scratch register (aka, load instead of store) would work as well apart from the need of a scratch register. We can also vectorize with scatters ;) (just kidding) Richard. > Uros.