Re: [PATCH][AArch64] Improve popcount expansion

2020-02-12 Thread Richard Sandiford
Wilco Dijkstra writes: > The popcount expansion uses umov to extend the result and move it back > to the integer register file. If we model ADDV as a zero-extending > operation, fmov can be used to move back to the integer side. This > results in a ~0.5% speedup on deepsjeng on Cortex-A57. > > A

Re: [PATCH][AArch64] Improve popcount expansion

2020-02-04 Thread Wilco Dijkstra
Hi Andrew, > You might want to add a testcase that the autovectorizers too. > > Currently we get also: > >    ldr q0, [x0] >    addv    b0, v0.16b >    umov    w0, v0.b[0] >    ret My patch doesn't change this case on purpose - there are also many intrinsics which generate re

Re: [PATCH][AArch64] Improve popcount expansion

2020-02-03 Thread Andrew Pinski
On Mon, Feb 3, 2020 at 7:02 AM Wilco Dijkstra wrote: > > The popcount expansion uses umov to extend the result and move it back > to the integer register file. If we model ADDV as a zero-extending > operation, fmov can be used to move back to the integer side. This > results in a ~0.5% speedup on

[PATCH][AArch64] Improve popcount expansion

2020-02-03 Thread Wilco Dijkstra
The popcount expansion uses umov to extend the result and move it back to the integer register file. If we model ADDV as a zero-extending operation, fmov can be used to move back to the integer side. This results in a ~0.5% speedup on deepsjeng on Cortex-A57. A typical __builtin_popcount expansio