Ondřej Bílka <nel...@seznam.cz> writes:
>
> On ivy bridge I got that Using rep stosq for memset(x,0,4096) is 20%
> slower than libcall for L1 cache resident data while 50% faster for data
> outside cache. How do you teach compiler that?

It would be in theory possible with autofdo. Profile with a cache miss
event. Correlate. Maintain the information in addition to the basic
block frequencies.

Probably not simple, but definitely possible.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only

Reply via email to