On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt <hongtao....@intel.com> wrote: > > > > > > Currently we have 3 different independent tunes for gather > > > "use_gather,use_gather_2parts,use_gather_4parts", > > > similar for scatter, there're > > > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > > > > > The patch support 2 standardizing options to enable/disable > > > vectorization for all gather/scatter instructions. The options is > > > interpreted by driver to 3 tunes. > > > > > > bootstrapped and regtested on x86_64-pc-linux-gnu. > > > Ok for trunk? > > > > I think -mgather/-mscatter are too close to -mfma suggesting they > > enable part of an ISA but they won't disable the use of intrinsics > > or enable gather/scatter on CPUs where the ISA doesn't have them. > > > > May I suggest to invent a more generic "short-cut" to > > -mtune-ctrl=^X, maybe -mdisable=X? And for gather/scatter > > tunables add ^use_gather_any to cover all cases? (or > > change what use_gather controls - it seems we changed its > > meaning before, and instead add use_gather_8parts and > > use_gather_16parts) > > > > That is, what's the point of this? > > https://www.phoronix.com/review/downfall > > that caused: > > https://www.phoronix.com/review/intel-downfall-benchmarks
Yes, I know. But there's -mtune-ctl=<very long line> doing the trick. GCC 11 had only 'use_gather', covering all number of lanes. I suggest to resurrect that behavior and add use_gather_8+parts (or two, IIRC gather works only on SI/SFmode or larger). Then -mtune-ctl=^use_gather works which I think is nice enough? Richard. > Uros.