I assume that the compiler based instrumentation, should be more efficient than binary instrumentation. But, I was just interested in the process of implementation for that tool. Sorry for the noise.
On 10/07/2018 11:03 AM, Richard Biener wrote: > On October 6, 2018 10:17:48 PM GMT+02:00, Denis Khalikov > <d.khali...@partner.samsung.com> wrote: >> Hello everyone, >> this is a patch which implements EfficiencySanitizer aka ESan >> into GCC. The EfficiencySanitizer tool is available in the llvm. >> So, the main idea was to port the runtime library into GCC and >> implement GCC compiler pass on GIMPLE IR with the same semantics >> as llvm does on llvm IR. >> The main difference that this patch also enables ESan under 32 bit >> ARM CPU with some changes to runtime library. >> Link to the RFC on the llvm-dev: >> https://lists.llvm.org/pipermail/llvm-dev/2016-April/098355.html >> >> I know this patch is not acceptable into GCC trunk, so, I send >> this patch in the weekend to don't bother anyone, but may be >> someone will be interested. Also, I'll be very appreciated for >> any feedback. >> GCC should be build with --enable-checking=release. >> >> This patch includes: >> >> 1. GCC pass for the CacheFragmentation tool on the GIMPLE IR. >> Special compiler pass instruments every memory access into the struct >> field with gimple internal call ESAN_RECORD_ACCESS and expands >> it in sanopt pass. >> Creates fields counter array, each cell of that array >> counts memory accesses to the special field. Creates array of >> the structs, where the every instance of the struct represents meta >> info of the real struct. >> >> a. Source example: >> >> struct node { >> int a; >> int b; >> int c; >> }; >> >> int main () { >> struct node c; >> for (int i = 0; i < 100; ++i) { >> c.a = i + 1; >> c.b = i + 1; >> c.c = i + 1; >> } >> return 0; >> } >> >> b. Instrumented GIMPLE: >> <bb 4> : >> _1 = i_4 + 1; >> .ESAN_RECORD_ACCESS (0B, 0); >> c.a = _1; >> _2 = i_4 + 1; >> .ESAN_RECORD_ACCESS (0B, 1); >> c.b = _2; >> _3 = i_4 + 1; >> .ESAN_RECORD_ACCESS (0B, 2); >> c.c = _3; >> i_11 = i_4 + 1; >> >> c. Assembler: >> >> # The fields counter array. >> # Every cell 8 bytes long and represents the amount >> # of the field accesses. >> >> .weak struct.node$1$1$1 >> .bss >> .align 8 >> .type struct.node$1$1$1, @object >> .size struct.node$1$1$1, 24 >> struct.node$1$1$1: >> .zero 24 >> >> # Increment the specific cell by the field index. >> # Actually __esan_increment, could be inlined. >> movl $struct.node$1$1$1, %eax >> movq %rax, %rdi >> call __esan_increment >> movl %ebx, -32(%rbp) >> movl -20(%rbp), %eax >> leal 1(%rax), %ebx >> movl $struct.node$1$1$1+8, %eax >> movq %rax, %rdi >> call __esan_increment >> movl %ebx, -28(%rbp) >> movl -20(%rbp), %eax >> leal 1(%rax), %ebx >> movl $struct.node$1$1$1+16, %eax >> movq %rax, %rdi >> call __esan_increment >> >> # The array of the structs with >> # meta info like size of the special struct, >> # number of the fields and pointer to the >> # fields counter array. >> >> .Lesan_info0: >> .quad .LC0 >> .long 12 >> .long 3 >> .quad 0 >> .quad 0 >> .quad 0 >> .quad struct.node$1$1$1 >> .quad 0 >> >> __esan_init is inserted to the static constructor. >> __esan_exit is inserted to the static destructor. >> >> d. Output: >> >> ==28719== struct node >> ==28719== size = 12, count = 300, ratio = 2 >> ==28719== # 0: count = 100 >> ==28719== # 1: count = 100 >> ==28719== # 2: count = 100 >> ==28719==EfficiencySanitizer: total struct field access count = 300 >> >> 2. GCC pass for the WorkingSet tool. >> Special compiler pass instruments every memory access in the program. >> Memory accesses are simply prepended with a function call like >> __esan_aligned_load(addr), __esan_aligned_store(addr). >> Also, __esan_init is inserted to the static constructor and >> __esan_exit is inserted to the static destructor. >> >> a. Assembler: >> >> movq -32(%rbp), %rax >> movq %rax, %rdi >> call __esan_aligned_store1 >> >> The runtime library simply manages shadow memory and computes statistic >> of the program efficiency. The tool maps one cache line (64 bytes) of >> the program to the one byte of the shadow memory. >> The runtime library measures the data working set size of an >> application >> at each snapshot during execution. >> >> b. Output: >> >> ==28742== EfficiencySanitizer: the total working set size: 32 MB >> (524291 >> cache lines) >> >> HOW TO USE: >> >> WorkingSet tool. >> To measure the working set size, you should build your binary or >> shared library with compile time flag >> -fsanitize=efficiency-working-set and set runtime options >> ESAN_OPTIONS=process_range_access=1:record_snapshots=1 >> >> CacheFragmentation tool. >> To enable CacheFragmentation tool you should compile your binary or >> shared library with compile time flag -fsanitize=efficiency-cache-frag >> and set runtime options ESAN_OPTIONS=build_mode=0:verbosity=1 > > I wonder how this is more efficient or precise than tools like valgrind with > a suitable CPU model? (with valgrind possibly using a JIT) > > Richard. > > >