Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Wilco Dijkstra
Hi Andrew, > Yes I agree a better cost model for CTZ/CLZ is the right solution but > I disagree with 2 ALU instruction as the cost.  It should either be > the same cost as a multiply or have its own cost entry. > For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4 > cycles, the sa

Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Wilco Dijkstra
Hi Richard, See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565#c8 - the problem is more generic like I suspected and it's easy to create similar examples. So while this turned out to be an easy worksaround for ctz, there general case is harder to avoid since you still want to allow beneficial

Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Andrew Pinski
On Wed, Feb 12, 2020 at 9:56 AM Richard Sandiford wrote: > > Wilco Dijkstra writes: > > Hi Richard, > > > > Right, so this is an alternative approach using costs - Combine won't try to > > duplicate instructions if it increases costs, so increasing the ctz cost to > > 2 > > instructions (which i

Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > > Right, so this is an alternative approach using costs - Combine won't try to > duplicate instructions if it increases costs, so increasing the ctz cost to 2 > instructions (which is the correct cost for ctz anyway) ...agreed... > ensures we still get effi

Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Wilco Dijkstra
Hi Richard, Right, so this is an alternative approach using costs - Combine won't try to duplicate instructions if it increases costs, so increasing the ctz cost to 2 instructions (which is the correct cost for ctz anyway) ensures we still get efficient code for this example: [AArch64] Set ctz rt

Re: [PATCH][AArch64] Improve clz patterns

2020-02-07 Thread Segher Boessenkool
On Fri, Feb 07, 2020 at 06:01:44PM +, Richard Sandiford wrote: > Wilco Dijkstra writes: > > Although GCC should understand the limited range of clz/ctz/cls results, > > Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary > > sign extension. Avoid this by adding an expl

Re: [PATCH][AArch64] Improve clz patterns

2020-02-07 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > >> Could you go into more detail about what the before and after code >> looks like, and what combine is doing? Like you say, this sounds >> like a target-independent thing on face value. > > It is indeed, but it seems specific to instructions where we have

Re: [PATCH][AArch64] Improve clz patterns

2020-02-04 Thread Wilco Dijkstra
Hi Richard, > Could you go into more detail about what the before and after code > looks like, and what combine is doing?  Like you say, this sounds > like a target-independent thing on face value. It is indeed, but it seems specific to instructions where we have range information which allows it

Re: [PATCH][AArch64] Improve clz patterns

2020-02-04 Thread Richard Sandiford
Wilco Dijkstra writes: > Although GCC should understand the limited range of clz/ctz/cls results, > Combine sometimes behaves oddly and duplicates ctz to remove a > sign extension. Avoid this by adding an explicit AND with 127 in the > patterns. Deepsjeng performance improves by ~0.6%. Could you