Hi Andrew, > Yes I agree a better cost model for CTZ/CLZ is the right solution but > I disagree with 2 ALU instruction as the cost. It should either be > the same cost as a multiply or have its own cost entry. > For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4 > cycles, the same as the cost as a multiple; on OcteonTX2 it is 5 > cycles (again the same cost as a multiple).
+ if (speed) + *cost += extra_cost->alu.clz + extra_cost->alu.rev; + return false; So if the cost of clz and ctz are similar enough, this will use the defined per-cpu costs. Cheers, Wilco