Hi Andrew,
> Yes I agree a better cost model for CTZ/CLZ is the right solution but
> I disagree with 2 ALU instruction as the cost. It should either be
> the same cost as a multiply or have its own cost entry.
> For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4
> cycles, the sa
Hi Richard,
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565#c8 - the problem is
more generic like I suspected and it's easy to create similar examples. So while
this turned out to be an easy worksaround for ctz, there general case is harder
to avoid since you still want to allow beneficial
On Wed, Feb 12, 2020 at 9:56 AM Richard Sandiford
wrote:
>
> Wilco Dijkstra writes:
> > Hi Richard,
> >
> > Right, so this is an alternative approach using costs - Combine won't try to
> > duplicate instructions if it increases costs, so increasing the ctz cost to
> > 2
> > instructions (which i
Wilco Dijkstra writes:
> Hi Richard,
>
> Right, so this is an alternative approach using costs - Combine won't try to
> duplicate instructions if it increases costs, so increasing the ctz cost to 2
> instructions (which is the correct cost for ctz anyway)
...agreed...
> ensures we still get effi
Hi Richard,
Right, so this is an alternative approach using costs - Combine won't try to
duplicate instructions if it increases costs, so increasing the ctz cost to 2
instructions (which is the correct cost for ctz anyway) ensures we still get
efficient code for this example:
[AArch64] Set ctz rt
On Fri, Feb 07, 2020 at 06:01:44PM +, Richard Sandiford wrote:
> Wilco Dijkstra writes:
> > Although GCC should understand the limited range of clz/ctz/cls results,
> > Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
> > sign extension. Avoid this by adding an expl
Wilco Dijkstra writes:
> Hi Richard,
>
>> Could you go into more detail about what the before and after code
>> looks like, and what combine is doing? Like you say, this sounds
>> like a target-independent thing on face value.
>
> It is indeed, but it seems specific to instructions where we have
Hi Richard,
> Could you go into more detail about what the before and after code
> looks like, and what combine is doing? Like you say, this sounds
> like a target-independent thing on face value.
It is indeed, but it seems specific to instructions where we have range
information which allows it
Wilco Dijkstra writes:
> Although GCC should understand the limited range of clz/ctz/cls results,
> Combine sometimes behaves oddly and duplicates ctz to remove a
> sign extension. Avoid this by adding an explicit AND with 127 in the
> patterns. Deepsjeng performance improves by ~0.6%.
Could you