> > So would need much more benchmarking on macro workloads first at least. > > Like what, for example? I believe in this case everything also > strongly depends on test usage model (e.g. it usually compiled with Os > not O2) and, let's say, internal test structure - whether there are > hot loops that suitable for unroll.
Normally the compiler doesn't know if a loop is hot unless you use profile feedback. So worst case on a big code base you may end up with a lot of unnecessary unrolling. On cold code it's just wasted bytes, but there could be already icache limited code where it would be worse. How about just a compiler bootstrap on Atom as a "worst case"? For the benchmark can you use profile feedback? BTW I know some loops are unrolled at -O3 by default at tree level because the vectorizer likes it. I actually have an older patch to dial this down for some common cases. -Andi -- a...@linux.intel.com -- Speaking for myself only.