On 10/27/23 01:49, Robin Dapp wrote:
@@ -346,7 +346,7 @@ static const struct riscv_tune_param rocket_tune_info = {
    {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},     /* fp_mul */
    {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},   /* fp_div */
    {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},     /* int_mul */
-  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},      /* int_div */
+  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)},    /* int_div */
    1,                                          /* issue_rate */
    3,                                          /* branch_cost */
    5,                                          /* memory_cost */
@@ -361,7 +361,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
    {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},     /* fp_mul */
    {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},   /* fp_div */
    {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},     /* int_mul */
-  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},      /* int_div */
+  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)},    /* int_div */
    2,                                          /* issue_rate */
    4,                                          /* branch_cost */
    3,                                          /* memory_cost */
@@ -376,7 +376,7 @@ static const struct riscv_tune_param thead_c906_tune_info = 
{
    {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
    {COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */
    {COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */
-  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */
+  {COSTS_N_INSNS (18), COSTS_N_INSNS (34)}, /* int_div */
    1,            /* issue_rate */
    3,            /* branch_cost */
    5,            /* memory_cost */

Instruction costs don't really correspond to latencies even though
sometimes they are used as if they were.  I'm a bit wary of using
e.g. 65 which would disparage each use of an integer division inside
a sequence.

Could you check which costs we need in order to still emit your wanted
sequence?  Maybe we can use values a bit lower than yours and still
get the proper code.  Where is the decision being made actually?
The main use of costing of a div/mod instruction is to guide the reciprocal division code when dividing by a constant. In that context we're comparing costs against a sequence of multiplies, shifts, add/sub insns which are almost always costed by their latency. So using latency for division is a reasonable place to start.

The other thing that might be worth investigating for those processors would be to set "use_divmod_expansion" in the cost structure. I've heard talk of fusing div/mod into divmod, though I'm not aware of any part implementing that fusion (from a prior life, that would seem to require a 2nd output port on the integer unit which could be highly undesirable). Anyway, this could be a followup item for Yangyu if it looks profitable.

jeff

Reply via email to