Re: Re: [PATCH v1] LoongArch: Merge constant vector permuatation implementations.

李威 Thu, 28 Dec 2023 04:34:58 -0800

I also have the same doubts about vector instructions.😂
Sorry i can't prove it, so i used simplify_gen_subreg instead to make sure 
there won't be problems (i submitted the v2 version), my oversight.


> -----原始邮件-----
> 发件人: "Xi Ruoyao" <xry...@xry111.site>
> 发送时间:2023-12-28 18:55:01 (星期四)
> 收件人: "Li Wei" <li...@loongson.cn>, gcc-patches@gcc.gnu.org
> 抄送: i...@xen0n.name, xucheng...@loongson.cn, chengl...@loongson.cn
> 主题: Re: [PATCH v1] LoongArch: Merge constant vector permuatation 
> implementations.
> 
> On Thu, 2023-12-28 at 14:59 +0800, Li Wei wrote:
> > There are currently two versions of the implementations of constant
> > vector permutation: loongarch_expand_vec_perm_const_1 and
> > loongarch_expand_vec_perm_const_2.  The implementations of the two
> > versions are different. Currently, only the implementation of
> > loongarch_expand_vec_perm_const_1 is used for 256-bit vectors.  We
> > hope to streamline the code as much as possible while retaining the
> > better-performing implementation of the two.  By repeatedly testing
> > spec2006 and spec2017, we got the following Merged version.
> > Compared with the pre-merger version, the number of lines of code
> > in loongarch.cc has been reduced by 888 lines.  At the same time,
> > the performance of SPECint2006 under Ofast has been improved by 0.97%,
> > and the performance of SPEC2017 fprate has been improved by 0.27%.
> 
> /* snip */
> 
> > - * 3. What LASX permutation instruction does:
> > - * In short, it just execute two independent 128bit vector permuatation, 
> > and
> > - * it's the reason that we need to do the jobs below.  We will explain it.
> > - * op0, op1, target, and selector will be separate into high 128bit and low
> > - * 128bit, and do permutation as the description below:
> > - *
> > - *  a) op0's low 128bit and op1's low 128bit "combines" into a 256bit temp
> > - * vector storage (TVS1), elements are indexed as below:
> > - *     0 ~ nelt / 2 - 1      nelt / 2 ~ nelt - 1
> > - * |---------------------|---------------------| TVS1
> > - *     op0's low 128bit      op1's low 128bit
> > - *    op0's high 128bit and op1's high 128bit are "combined" into TVS2 in 
> > the
> > - *    same way.
> > - *     0 ~ nelt / 2 - 1      nelt / 2 ~ nelt - 1
> > - * |---------------------|---------------------| TVS2
> > - *     op0's high 128bit   op1's high 128bit
> > - *  b) Selector's low 128bit describes which elements from TVS1 will fit 
> > into
> > - *  target vector's low 128bit.  No TVS2 elements are allowed.
> > - *  c) Selector's high 128bit describes which elements from TVS2 will fit 
> > into
> > - *  target vector's high 128bit.  No TVS1 elements are allowed.
> 
> Just curious: why the hardware engineers created such a bizarre
> instruction? :)
> 
> /* snip */
> 
> > +     rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0);
> > +     rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0);
> 
> Can we prove d->op0, d->op1, and d->target are never SUBREGs?  Otherwise
> I'd use lowpart_subreg (E_V4DImode, d->xxx, d->vmode) here to avoid
> creating a nested SUBREG (nested SUBREG will cause an ICE and it has
> happened several times before).
> 
> /* snip */
> 
> > +     switch (d->vmode)
> >         {
> > -         remapped[i] = d->perm[i];
> > +       case E_V4DFmode:
> > +         sel = gen_rtx_CONST_VECTOR (E_V4DImode, gen_rtvec_v (d-
> > >nelt,
> > +                                                             
> > rperm));
> > +         tmp = gen_rtx_SUBREG (E_V4DImode, d->target, 0);
> 
> Likewise.
> 
> > +         emit_move_insn (tmp, sel);
> > +         break;
> > +       case E_V8SFmode:
> > +         sel = gen_rtx_CONST_VECTOR (E_V8SImode, gen_rtvec_v (d-
> > >nelt,
> > +                                                             
> > rperm));
> > +         tmp = gen_rtx_SUBREG (E_V8SImode, d->target, 0);
> 
> Likewise.
> 
> -- 
> Xi Ruoyao <xry...@xry111.site>
> School of Aerospace Science and Technology, Xidian University


本邮件及其附件含有龙芯中科的商业秘密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制或散发）本邮件及其附件中的信息。如果您错收本邮件，请您立即电话或邮件通知发件人并删除本邮件。
 
This email and its attachments contain confidential information from Loongson 
Technology , which is intended only for the person or entity whose address is 
listed above. Any use of the information contained herein in any way 
(including, but not limited to, total or partial disclosure, reproduction or 
dissemination) by persons other than the intended recipient(s) is prohibited. 
If you receive this email in error, please notify the sender by phone or email 
immediately and delete it. 

本邮件及其附件含有龙芯中科的商业秘密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制或散发）本邮件及其附件中的信息。如果您错收本邮件，请您立即电话或邮件通知发件人并删除本邮件。
 
This email and its attachments contain confidential information from Loongson 
Technology , which is intended only for the person or entity whose address is 
listed above. Any use of the information contained herein in any way 
(including, but not limited to, total or partial disclosure, reproduction or 
dissemination) by persons other than the intended recipient(s) is prohibited. 
If you receive this email in error, please notify the sender by phone or email 
immediately and delete it.

Re: Re: [PATCH v1] LoongArch: Merge constant vector permuatation implementations.

Reply via email to