> -----Original Message----- > From: H.J. Lu <hjl.to...@gmail.com> > Sent: Tuesday, May 6, 2025 2:16 PM > To: Liu, Hongtao <hongtao....@intel.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Uros Bizjak > <ubiz...@gmail.com> > Subject: Re: [PATCH] x86: Skip if the mode size is smaller than its natural > size > > On Tue, May 6, 2025 at 10:54 AM Liu, Hongtao <hongtao....@intel.com> > wrote: > > > > > > > > > -----Original Message----- > > > From: H.J. Lu <hjl.to...@gmail.com> > > > Sent: Thursday, May 1, 2025 6:39 AM > > > To: GCC Patches <gcc-patches@gcc.gnu.org>; Uros Bizjak > > > <ubiz...@gmail.com>; Liu, Hongtao <hongtao....@intel.com> > > > Subject: [PATCH] x86: Skip if the mode size is smaller than its > > > natural size > > > > > > When generating a SUBREG from V16QI to V2HF, validate_subreg fails > > > since the V2HF size (4 bytes) is smaller than its natural size (word > > > size). > > > Update remove_redundant_vector_load to skip if the mode size is > > > smaller than its natural size. > > I think we can also handle it in replace_vector_const by inserting an > > extra move with (Set (reg:v4qi) (subreg:v4qi (v16qi const0_rtx) 0)) > > And then use subreg with same vector size (v2hf<->v4qi) (set > > (reg:v2hf) (subreg:v2hf (reg:v4qi) 0)) > > What is the advantage of this approach? My patch uses a single instruction to > write 4 bytes of 0s and 1s. Your suggestion needs at least one more > instruction. I'm not asking to do it for all the cases, just to handle those cases with invalid subreg
@@ -3334,8 +3334,11 @@ replace_vector_const (machine_mode vector_mode, rtx vector_const, machine_mode mode = GET_MODE (dest); rtx replace; + if (!validate_subreg (mode, vector_mode, vector_const, 0)) + /* Insert an extra move to avoid invalid subreg. */ + ......... /* Replace the source operand with VECTOR_CONST. */ - if (SUBREG_P (dest) || mode == vector_mode) + else if (SUBREG_P (dest) || mode == vector_mode) replace = vector_const; else replace = gen_rtx_SUBREG (mode, vector_const, 0); For valid subreg, no need for extra instruction. I think RA can eliminate the extra move, then the optimization is not limited to "the mode size is smaller than its natural size". > > > I think this can also pass validate_subreg. > > > > > > gcc/ > > > > > > PR target/120036 > > > * config/i386/i386-features.cc (remove_redundant_vector_load): > > > Also skip if the mode size is smaller than its natural size. > > > > > > gcc/testsuite/ > > > > > > PR target/120036 > > > * g++.target/i386/pr120036.C: New test. > > > > > > -- > > > H.J. > > > > -- > H.J.