On Mon, 5 Oct 2015 21:26:39 +0200 Jesper Dangaard Brouer <bro...@redhat.com> wrote:
> My only problem left, is I want a perf measurement that pinpoint these > kind of spots. The difference in L1-icache-load-misses were significant > (1,278,276 vs 2,719,158). I tried to somehow perf record this with > different perf events without being able to pinpoint the location (even > though I know the spot now). Even tried Andi's ocperf.py... maybe he > will know what event I should try? Using: 'ocperf.py -e icache_misses' and looking closer at the perf annotate and considering "skid" I think I can see the icache misses happening in the end of the function, due to the UD2 inst. Annotation of kmem_cache_free_bulk (last/end of func): │17b: test %r12,%r12 │ ↑ jne 2e │184: pop %rbx │ pop %r12 │ pop %r13 │ pop %r14 │ pop %r15 │ pop %rbp │ ← retq 8.57 │18f: mov 0x30(%rdx),%rdx 5.71 │ ↑ jmp 116 │195: ud2 2.86 │197: mov %rdi,%rsi │ mov %r11d,%r8d │ mov %r10,%rcx │ mov %rbx,%rdx │ mov %r15,%rdi │ → callq __slab_free │ ↑ jmp 17b 2.86 │1ad: mov 0x30(%rdi),%rdi │ ↑ jmpq 99 -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html