On 06/20/2017 06:27 AM, Richard Biener wrote: > On Tue, Jun 20, 2017 at 2:20 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >> On Tue, Jun 20, 2017 at 2:17 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >>> On Tue, Jun 20, 2017 at 2:13 PM, Florian Weimer <fwei...@redhat.com> wrote: >>>> On 06/20/2017 01:10 PM, Uros Bizjak wrote: >>>> >>>>> 74,99% a.out a.out [.] test_or >>>>> 12,50% a.out a.out [.] test_movb >>>>> 12,50% a.out a.out [.] test_movl >>>> >>>> Could you try notl/notb/negl/negb as well, please? >>> >>> These all have the same (long) runtime as test_or. >> >> Perhaps we can use "testb $0, %0"? It doesn't write to the memory, but >> otherwise has the same runtime as movb/movl. > > That sounds good, OTOH it's a matter of putting strain on the > memory fetch or store side... We'll get cacheline allocations in > any case (but the memory will be used eventually). Instead > of test a mere movb into a scratch register (aka, load instead of > store) would work as well apart from the need of a scratch register. It was never clear to me why we always implement probes via stores -- though from development standpoint a destructive store is useful.
I'd expect a tst to generate the desired SEGV. How does that like compare to the partial-allocation + push approach? > > We can also vectorize with scatters ;) (just kidding) :-) jeff