Ondřej Bílka <nel...@seznam.cz> writes: > > On ivy bridge I got that Using rep stosq for memset(x,0,4096) is 20% > slower than libcall for L1 cache resident data while 50% faster for data > outside cache. How do you teach compiler that?
It would be in theory possible with autofdo. Profile with a cache miss event. Correlate. Maintain the information in addition to the basic block frequencies. Probably not simple, but definitely possible. -Andi -- a...@linux.intel.com -- Speaking for myself only