I assume that the compiler based instrumentation,
should be more efficient than binary instrumentation.
But, I was just interested in the process of implementation
for that tool.
Sorry for the noise.

On 10/07/2018 11:03 AM, Richard Biener wrote:
> On October 6, 2018 10:17:48 PM GMT+02:00, Denis Khalikov 
> <d.khali...@partner.samsung.com> wrote:
>> Hello everyone,
>> this is a patch which implements EfficiencySanitizer aka ESan
>> into GCC. The EfficiencySanitizer tool is available in the llvm.
>> So, the main idea was to port the runtime library into GCC and
>> implement GCC compiler pass on GIMPLE IR with the same semantics
>> as llvm does on llvm IR.
>> The main difference that this patch also enables ESan under 32 bit
>> ARM CPU with some changes to runtime library.
>> Link to the RFC on the llvm-dev:
>> https://lists.llvm.org/pipermail/llvm-dev/2016-April/098355.html
>>
>> I know this patch is not acceptable into GCC trunk, so, I send
>> this patch in the weekend to don't bother anyone, but may be
>> someone will be interested. Also, I'll be very appreciated for
>> any feedback.
>> GCC should be build with --enable-checking=release.
>>
>> This patch includes:
>>
>> 1. GCC pass for the CacheFragmentation tool on the GIMPLE IR.
>> Special compiler pass instruments every memory access into the struct
>> field with gimple internal call ESAN_RECORD_ACCESS and expands
>> it in sanopt pass.
>> Creates fields counter array, each cell of that array
>> counts memory accesses to the special field. Creates array of
>> the structs, where the every instance of the struct represents meta
>> info of the real struct.
>>
>> a. Source example:
>>
>> struct node {
>>   int a;
>>   int b;
>>   int c;
>> };
>>
>> int main () {
>>   struct node c;
>>   for (int i = 0; i < 100; ++i) {
>>     c.a = i + 1;
>>     c.b = i + 1;
>>     c.c = i + 1;
>>   }
>>   return 0;
>> }
>>
>> b. Instrumented GIMPLE:
>>   <bb 4> :
>>   _1 = i_4 + 1;
>>   .ESAN_RECORD_ACCESS (0B, 0);
>>   c.a = _1;
>>   _2 = i_4 + 1;
>>   .ESAN_RECORD_ACCESS (0B, 1);
>>   c.b = _2;
>>   _3 = i_4 + 1;
>>   .ESAN_RECORD_ACCESS (0B, 2);
>>   c.c = _3;
>>   i_11 = i_4 + 1;
>>
>> c. Assembler:
>>
>> # The fields counter array.
>> # Every cell 8 bytes long and represents the amount
>> # of the field accesses.
>>
>>     .weak    struct.node$1$1$1
>>     .bss
>>     .align 8
>>     .type    struct.node$1$1$1, @object
>>     .size    struct.node$1$1$1, 24
>> struct.node$1$1$1:
>>     .zero    24
>>
>> # Increment the specific cell by the field index.
>> # Actually __esan_increment, could be inlined.
>>    movl    $struct.node$1$1$1, %eax
>>    movq    %rax, %rdi
>>    call    __esan_increment
>>    movl    %ebx, -32(%rbp)
>>    movl    -20(%rbp), %eax
>>    leal    1(%rax), %ebx
>>    movl    $struct.node$1$1$1+8, %eax
>>    movq    %rax, %rdi
>>    call    __esan_increment
>>    movl    %ebx, -28(%rbp)
>>    movl    -20(%rbp), %eax
>>    leal    1(%rax), %ebx
>>    movl    $struct.node$1$1$1+16, %eax
>>    movq    %rax, %rdi
>>    call    __esan_increment
>>
>> # The array of the structs with
>> # meta info like size of the special struct,
>> # number of the fields and pointer to the
>> # fields counter array.
>>
>> .Lesan_info0:
>>     .quad    .LC0
>>     .long    12
>>     .long    3
>>     .quad    0
>>     .quad    0
>>     .quad    0
>>     .quad    struct.node$1$1$1
>>     .quad    0
>>
>> __esan_init is inserted to the static constructor.
>> __esan_exit is inserted to the static destructor.
>>
>> d. Output:
>>
>> ==28719==  struct node
>> ==28719==   size = 12, count = 300, ratio = 2
>> ==28719==   # 0: count = 100
>> ==28719==   # 1: count = 100
>> ==28719==   # 2: count = 100
>> ==28719==EfficiencySanitizer: total struct field access count = 300
>>
>> 2. GCC pass for the WorkingSet tool.
>> Special compiler pass instruments every memory access in the program.
>> Memory accesses are simply prepended with a function call like
>> __esan_aligned_load(addr), __esan_aligned_store(addr).
>> Also, __esan_init is inserted to the static constructor and
>> __esan_exit is inserted to the static destructor.
>>
>> a. Assembler:
>>
>>   movq    -32(%rbp), %rax
>>   movq    %rax, %rdi
>>   call    __esan_aligned_store1
>>
>> The runtime library simply manages shadow memory and computes statistic
>> of the program efficiency. The tool maps one cache line (64 bytes) of
>> the program to the one byte of the shadow memory.
>> The runtime library measures the data working set size of an
>> application
>> at each snapshot during execution.
>>
>> b. Output:
>>
>> ==28742== EfficiencySanitizer: the total working set size: 32 MB
>> (524291
>> cache lines)
>>
>> HOW TO USE:
>>
>> WorkingSet tool.
>> To measure the working set size, you should build your binary or
>> shared library with compile time flag
>> -fsanitize=efficiency-working-set and set runtime options
>> ESAN_OPTIONS=process_range_access=1:record_snapshots=1
>>
>> CacheFragmentation tool.
>> To enable CacheFragmentation tool you should compile your binary or
>> shared library with compile time flag -fsanitize=efficiency-cache-frag
>> and set runtime options ESAN_OPTIONS=build_mode=0:verbosity=1
>
> I wonder how this is more efficient or precise than tools like valgrind with 
> a suitable CPU model? (with valgrind possibly using a JIT)
>
> Richard.
>
>
>

Reply via email to