On Tue, Nov 20, 2018 at 10:16 AM Stephane Eranian <eran...@google.com> wrote: > I would like to understand better the PMU behavior you are relying upon and > why the V4 freeze approach is breaking it. Could you elaborate?
I investigated a bit more to write this response and discovered that my initial characterization of the problem as an overcount during replay is incorrect; what we are actually seeing is an undercount during recording. rr relies on the userspace retired-conditional-branches counter being exactly the same between recording and replay. The primary reason we do this is to establish a program "timeline", allowing us to find the correct place to inject asynchronous signals during replay (the program counter plus the retired-conditional-branches counter value uniquely identifies a point in most programs). Because we run the rcb counter during recording, we piggy back on it by programming it to interrupt the program every few hundred thousand branches to give us a chance to context switch to a different program thread. We've found that with counter freezing enabled, when the PMI fires, the reported value of the retired conditional branches counter is low by something on the order of 10 branches. In a single threaded program, although the PMI fires, we don't actually record a context switch or the counter value at this point. We continue on to the next tracee event (e.g. a syscall) and record the counter value at that point. Then, during replay, we replay to the syscall and check that the replay counter value matches the recorded value and find that it is too high. (NB: during a single threaded replay the PMI is not used here because there is no asynchronous event.) Repeatedly recording the same program produces traces that have different recorded retired-conditional-branch counter values after the first PMI fired during recording, but during replay we always count off the same number of branches, further suggesting that the replay value is correct. And finally, recordings made on a kernel with counter freezing active still fail to replay on a kernel without counter freezing active. I don't know what the underlying mechanism for the loss of counter events is (e.g. whether it's incorrect code in the interrupt handler, a silicon bug, or what) but it's clear that the counter freezing implementation is causing events to be lost. - Kyle - Kyle