On Sat, Jun 11, 2016 at 5:49 PM, Huang, Ying <ying.hu...@intel.com> wrote: > > From perf profile, the time spent in page_fault and its children > functions are almost same (7.85% vs 7.81%). So the time spent in page > fault and page table operation itself doesn't changed much. So, you > mean CPU may be slower to load the page table entry to TLB if accessed > bit is not set?
So the CPU does take a microfault internally when it needs to set the accessed/dirty bit. It's not architecturally visible, but you can see it when you do timing loops. I've timed it at over a thousand cycles on at least some CPU's, but that's still peanuts compared to a real page fault. It shouldn't be *that* noticeable, ie no way it's a 6% regression on its own. Linus