https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #35 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- (In reply to rguent...@suse.de from comment #24) > On September 27, 2020 4:56:43 AM GMT+02:00, crazylht at gmail dot com > <gcc-bugzi...@gcc.gnu.org> wrote: > >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 > > > >--- Comment #22 from Hongtao.liu <crazylht at gmail dot com> --- > >>One of my workmates found that if we disable vectorization for > >SPEC2017 >525.x264_r function sub4x4_dct in source file > >x264_src/common/dct.c with ?>explicit function attribute > >__attribute__((optimize("no-tree-vectorize"))), it >can speed up by 4%. > > > >For CLX, if we disable slp vectorization in sub4x4_dct by > >__attribute__((optimize("no-tree-slp-vectorize"))), it can also speed > >up by 4%. > > > >> Thanks Richi! Should we take care of this case? or neglect this kind > >of > >> extension as "no instruction"? I was intent to handle it in target > >specific > >> code, but it isn't recorded into cost vector while it seems too heavy > >to do > >> the bb_info slp_instances revisits in finish_cost. > > > >For i386 backend unsigned char --> unsigned short is no "no > >instruction", but > >in this case > >--- > >1033 _134 = MEM[(pixel *)pix1_295 + 2B]; > > > >1034 _135 = (short unsigned int) _134; > >--- > > > >It could be combined and optimized to > >--- > >movzbl 19(%rcx), %r8d > >--- > > > >So, if "unsigned char" variable is loaded from memory, then the > >convertion > >would also be "no instruction", i'm not sure if backend cost model > >could handle > >such situation. > > I think all attempts to address this from the side of the scalar cost is > going to be difficult and fragile.. Agreed FWIW. Even in rtl, the kinds of conversion we're talking about could be removed, such as by proving that the upper bits are already correct, by combining the extension with other instructions so that it becomes “free” again, or by ree. Proving that the upper bits are already correct isn't uncommon: gimple has to make a choice between signed and unsigned types even if both choices would be correct, whereas rtl is sign-agnostic for storage. So it's not obvious to me that trying model things at this level is going to be right more often than it's wrong.