> -----原始邮件-----
&gt; 发件人: "Jeff Law" <j...@ventanamicro.com>
&gt; 发送时间: 2024-03-19 10:54:09 (星期二)
&gt; 收件人: Jiawei <jia...@iscas.ac.cn>, gcc-patches@gcc.gnu.org
&gt; 抄送: kito.ch...@sifive.com, pal...@dabbelt.com, 
christoph.muell...@vrull.eu, wuwei2...@iscas.ac.cn, shi...@iscas.ac.cn, 
shiyul...@iscas.ac.cn, chenyix...@iscas.ac.cn
&gt; 主题: Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.
&gt; 
&gt; 
&gt; 
&gt; On 2/27/24 1:52 AM, Jiawei wrote:
&gt; &gt; From: Chen Jiawei <jia...@iscas.ac.cn>
&gt; &gt; 
&gt; &gt; Co-Authored by: Lin Jiawei <jiawei....@epfl.ch>
&gt; &gt; 
&gt; &gt; This patch add XiangShan Nanhu cpu microarchitecture,
&gt; &gt; Nanhu is a 6-issue, superscalar, out-of-order processor.
&gt; &gt; More details see: 
https://xiangshan-doc.readthedocs.io/zh-cn/latest/arch
&gt; &gt; 
&gt; &gt; gcc/ChangeLog:
&gt; &gt; 
&gt; &gt;          * config/riscv/riscv-cores.def (RISCV_TUNE): New def.
&gt; &gt;          (RISCV_CORE): Ditto.
&gt; &gt;          * config/riscv/riscv-opts.h (enum
&gt; &gt;          * riscv_microarchitecture_type): New option.
&gt; &gt;          * config/riscv/riscv.cc: New def.
&gt; &gt;          * config/riscv/riscv.md: New include.
&gt; &gt;          * config/riscv/xiangshan.md: New file.
&gt; &gt; 
&gt; &gt; gcc/testsuite/ChangeLog:
&gt; &gt; 
&gt; &gt;          * gcc.target/riscv/mcpu-xiangshan-nanhu.c: New test.
&gt; As was discussed last Tuesday, this should be safe, even at this late 
&gt; stage in the gcc-14 cycle.
&gt; 
&gt; &gt;   
&gt; &gt; +/* Costs to use when optimizing for xiangshan nanhu.  */
&gt; &gt; +static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
&gt; &gt; +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* fp_add */
&gt; &gt; +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* fp_mul */
&gt; &gt; +  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},  /* fp_div */
&gt; &gt; +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* int_mul */
&gt; &gt; +  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},    /* int_div */
&gt; &gt; +  6,                                         /* issue_rate */
&gt; &gt; +  3,                                         /* branch_cost */
&gt; &gt; +  3,                                         /* memory_cost */
&gt; &gt; +  3,                                         /* fmv_cost */
&gt; &gt; +  true,                                              /* 
slow_unaligned_access */
&gt; &gt; +  false,                                     /* use_divmod_expansion 
*/
&gt; &gt; +  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,          /* fusible_ops */
&gt; &gt; +  NULL,                                              /* vector cost 
*/
&gt; Is your integer division really that fast?  The table above essentially 
&gt; says that your cpu can do integer division in 6 cycles.
&gt; 
&gt; &gt; +
&gt; &gt; +(define_insn_reservation "xiangshan_mul" 3
&gt; &gt; +  (and (eq_attr "tune" "xiangshan")
&gt; &gt; +       (eq_attr "type" "imul"))
&gt; &gt; +  "xs_mdu_rs")
&gt; &gt; +
&gt; &gt; +(define_insn_reservation "xiangshan_div" 21
&gt; &gt; +  (and (eq_attr "tune" "xiangshan")
&gt; &gt; +       (eq_attr "type" "idiv"))
&gt; &gt; +  "xs_mdu_rs")
&gt; Whereas your pipeline description says it's 21c.
&gt; 
&gt; I strongly suspect you want to increase the cost of the int_div in the 
&gt; tuning table.  And with a the higher cost you probably want to turn on 
&gt; use_divmod_expansion.
&gt; 
&gt; I'll also note that your scheduler description also indicates your 
&gt; division is fully pipelined.  Is that correct?  if not, you'll want to 
&gt; adjust that reservation.
&gt; 
&gt; 
&gt; 
&gt; &gt; +
&gt; &gt; +(define_insn_reservation "xiangshan_sfdiv" 11
&gt; &gt; +  (and (eq_attr "tune" "xiangshan")
&gt; &gt; +       (eq_attr "type" "fdiv")
&gt; &gt; +       (eq_attr "mode" "SF"))
&gt; &gt; +  "xs_fmisc_rs")
&gt; &gt; +
&gt; &gt; +(define_insn_reservation "xiangshan_sfsqrt" 17
&gt; &gt; +  (and (eq_attr "tune" "xiangshan")
&gt; &gt; +       (eq_attr "type" "fsqrt")
&gt; &gt; +       (eq_attr "mode" "SF"))
&gt; &gt; +  "xs_fmisc_rs")
&gt; &gt; +
&gt; &gt; +(define_insn_reservation "xiangshan_dfdiv" 21
&gt; &gt; +  (and (eq_attr "tune" "xiangshan")
&gt; &gt; +       (eq_attr "type" "fdiv")
&gt; &gt; +       (eq_attr "mode" "DF"))
&gt; &gt; +  "xs_fmisc_rs")
&gt; &gt; +
&gt; &gt; +(define_insn_reservation "xiangshan_dfsqrt" 37
&gt; &gt; +  (and (eq_attr "tune" "xiangshan")
&gt; &gt; +       (eq_attr "type" "fsqrt")
&gt; &gt; +       (eq_attr "mode" "DF"))
&gt; &gt; +  "xs_fmisc_rs")
&gt; Similarly these say your fpdiv and fpsqrt are fully pipelined.  It's 
&gt; certainly possible, but I suspect it's really just an oversight.  Given 
&gt; these values you may also want to adjust the cost of an fp division in 
&gt; the cost table.
&gt; 
&gt; 
&gt; Finally with such high values for for the div/sqrt units, we find that 
&gt; the DFA "blows up" causing genattrtab to run for a very long time. We'll 
&gt; have to keep an eye on that.
&gt; 
&gt; And just to be clear, I think these can be done as a followup patch. I'm 
&gt; going to push this patch as-is rather than make any adjustments -- you 
&gt; almost certainly know the processor's capabilities better than myself or 
&gt; anyone else on this list :-)
&gt; 
&gt; 
&gt; Jeff

Thank you for the comment, some pipeline processing costs may still need to
 be confirmed, and I will correct them in next patch.

BR,
Jiawei</jiawei....@epfl.ch></jia...@iscas.ac.cn></jia...@iscas.ac.cn></j...@ventanamicro.com>

Reply via email to