On Fri, Oct 12, 2018 at 2:26 PM Jann Horn <ja...@google.com> wrote: > > On Fri, Oct 12, 2018 at 11:41 AM Samuel Neves <sne...@dei.uc.pt> wrote: > > > > On Thu, Oct 11, 2018 at 8:25 PM Andy Lutomirski <l...@kernel.org> wrote: > > > What exactly is this trying to protect against? And how many cycles > > > should we expect L1D_FLUSH to take? > > > > As far as I could measure, I got 1660 cycles per wrmsr 0x10b, 0x1 on a > > Skylake chip, and 1220 cycles on a Skylake-SP. > > Is that with L1D mostly empty, with L1D mostly full with clean lines, > or with L1D full of dirty lines that need to be written back?
Mostly empty, as this is flushing repeatedly without bothering to refill L1d with anything. On Skylake the (averaged) uops breakdown is something like port 0: 255 port 1: 143 port 2: 176 port 3: 177 port 4: 524 port 5: 273 port 6: 616 port 7: 182 The number of port 4 dispatches is very close to the number of cache lines, suggesting one write per line (with respective 176+177+182 port {2, 3, 7} address generations). Furthermore, I suspect it also clears L1i cache. For 2^20 wrmsr executions, we have around 2^20 frontend_retired_l1i_miss events, but a negligible amount of frontend_retired_l2_miss ones.