https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514
--- Comment #3 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #1) > Confirmed. > > Note non sign bit can be improved too: > ``` I assume you're talking about broadcast from imm or directly from constant pool. GCC chooses the former, with -Os we can also generate the later. According to microbenchmark, the former is better. I also tries to disable broadcasting from imm and test with stress-ng vecmath, the performance is similar.