Wilco Dijkstra writes:
> The popcount expansion uses umov to extend the result and move it back
> to the integer register file. If we model ADDV as a zero-extending
> operation, fmov can be used to move back to the integer side. This
> results in a ~0.5% speedup on deepsjeng on Cortex-A57.
>
> A
Hi Andrew,
> You might want to add a testcase that the autovectorizers too.
>
> Currently we get also:
>
> ldr q0, [x0]
> addv b0, v0.16b
> umov w0, v0.b[0]
> ret
My patch doesn't change this case on purpose - there are also many intrinsics
which generate re
On Mon, Feb 3, 2020 at 7:02 AM Wilco Dijkstra wrote:
>
> The popcount expansion uses umov to extend the result and move it back
> to the integer register file. If we model ADDV as a zero-extending
> operation, fmov can be used to move back to the integer side. This
> results in a ~0.5% speedup on
The popcount expansion uses umov to extend the result and move it back
to the integer register file. If we model ADDV as a zero-extending
operation, fmov can be used to move back to the integer side. This
results in a ~0.5% speedup on deepsjeng on Cortex-A57.
A typical __builtin_popcount expansio