Hi, On 2023-11-06 20:47:34 -0600, Nathan Bossart wrote: > Separately, I'm wondering whether we should consider using CFLAGS_VECTORIZE > on the whole tree. Commit fdea253 seems to be responsible for introducing > this targeted autovectorization strategy, and AFAICT this was just done to > minimize the impact elsewhere while optimizing page checksums. Are there > fundamental problems with adding CFLAGS_VECTORIZE everywhere? Or is it > just waiting on someone to do the analysis/benchmarking?
Historically sometimes vectorization ended up hurting in a bunch of places. But I think that was in the gcc 4 era, which long has passed. IME these days using -O3 yields decent improvements over -O2 when used tree wide - even if there are perhaps a few isolated cases where the code is a bit worse, they're far outweighed by the improved code. Compile time wise it's noticeably slower, but not catastrophically so. On an older but decent laptop, while on battery: O2: 800.29user 41.99system 0:59.17elapsed 1423%CPU (0avgtext+0avgdata 282324maxresident)k 152inputs+4408176outputs (95major+13359282minor)pagefaults 0swaps O3: 911.80user 44.71system 1:06.79elapsed 1431%CPU (0avgtext+0avgdata 278660maxresident)k 82624inputs+4571480outputs (571major+14004898minor)pagefaults 0swaps Greetings, Andres Freund