Hi Andrew,

> Yes I agree a better cost model for CTZ/CLZ is the right solution but
> I disagree with 2 ALU instruction as the cost.  It should either be
> the same cost as a multiply or have its own cost entry.
> For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4
> cycles, the same as the cost as a multiple; on OcteonTX2 it is 5
> cycles (again the same cost as a multiple).

+      if (speed)
+          *cost += extra_cost->alu.clz + extra_cost->alu.rev;
+      return false;

So if the cost of clz and ctz are similar enough, this will use the defined
per-cpu costs.

Cheers,
Wilco

Reply via email to