Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-25 Thread Richard Biener
On Mon, 25 Nov 2024, Richard Biener wrote: > On Mon, 25 Nov 2024, Hongtao Liu wrote: > > > On Sun, Nov 24, 2024 at 8:05 PM Richard Biener wrote: > > > > > > > > > > > > > Am 24.11.2024 um 09:17 schrieb Hongtao Liu : > > > > > > > > On Fri, Nov 22, 2024 at 9:33 PM Richard Biener > > > > wrote:

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-25 Thread Richard Biener
On Mon, 25 Nov 2024, Hongtao Liu wrote: > On Sun, Nov 24, 2024 at 8:05 PM Richard Biener wrote: > > > > > > > > > Am 24.11.2024 um 09:17 schrieb Hongtao Liu : > > > > > > On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: > > >> > > >> Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning whic

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu
On Sun, Nov 24, 2024 at 8:05 PM Richard Biener wrote: > > > > > Am 24.11.2024 um 09:17 schrieb Hongtao Liu : > > > > On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: > >> > >> Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables > >> an extra 128bit SSE vector epilouge when doi

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Richard Biener
> Am 24.11.2024 um 09:17 schrieb Hongtao Liu : > > On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: >> >> Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables >> an extra 128bit SSE vector epilouge when doing 512bit AVX512 >> vectorization in the main loop the following all

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu
On Fri, Nov 22, 2024 at 9:33 PM Richard Biener wrote: > > Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables > an extra 128bit SSE vector epilouge when doing 512bit AVX512 > vectorization in the main loop the following allows a 64bit SSE > vector epilogue to be generated when the pr

[PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-22 Thread Richard Biener
Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables an extra 128bit SSE vector epilouge when doing 512bit AVX512 vectorization in the main loop the following allows a 64bit SSE vector epilogue to be generated when the previous vector epilogue still had a vectorization factor of 16 or