Issue 123855
Summary [libc++] Question about __libcpp_timed_backoff_policy implementation
Labels libc++
Assignees
Reporter bobsayshilol
    I'm not sure if this is the right place to ask this, but I tried using `std::barrier` and observed large sleep calls that I wasn't expecting when calling `wait()`. The issue seems to occur because of the interaction of the 2 largest checks in `__libcpp_timed_backoff_policy`:

https://github.com/llvm/llvm-project/blob/635e154bbc94342080ccba583ff6fb16ea364f4b/libcxx/include/__thread/timed_backoff_policy.h#L28-L31

Once `__elapsed` goes over 64 *micro*seconds we then backoff at increasing intervals until it reaches 128 *milli*seconds, at which point we then cap it to only 8ms. This means that it can sleep for up to 64ms before dropping down to 8ms.

As an example repro case:

```c++
int main() {
  // In practice this doesn't get close to 1M entries, but play it safe.
 std::vector<chrono::nanoseconds> timings(1'000'000);
  std::size_t counter = 0;
  chrono::nanoseconds last_elapsed = chrono::nanoseconds::zero();

 auto capture_backoff_timings = [&](chrono::nanoseconds elapsed) {
    // Save elapsed time to avoid an additional now() call in __poll().
 last_elapsed = elapsed;
    // Add the elapsed time of this call to the timings.
    timings[counter] = elapsed;
    ++counter;
    return __libcpp_timed_backoff_policy{}(elapsed);
  };

  auto poll_for_200ms = [&] {
    // As cheap as we can make it - this is just an atomic load in std::barrier::wait().
    return last_elapsed > chrono::milliseconds(250);
 };

  __libcpp_thread_poll_with_backoff(poll_for_200ms, capture_backoff_timings);

  // Sanity check.
  assert(counter < timings.size());
  timings.resize(counter);

  // Convert timings to how long the sleeps are.
  std::vector<chrono::nanoseconds> sleeps(counter);
 std::adjacent_difference(timings.begin(), timings.end(), sleeps.begin());

 // Print largest sleep call.
  auto max_sleep = std::max_element(sleeps.begin(), sleeps.end());
  auto &os = std::cout;
 os << "max sleep: "
     << chrono::duration_cast<chrono::milliseconds>(*max_sleep).count()
     << "ms\n";

#if 0 // Print timings too for graphing.
  os << "timings = [";
  for (auto &t : timings) {
    os << t.count() << ',';
  }
  os << "]\nsleeps = [";
  for (auto &s : sleeps) {
    os << s.count() << ',';
  }
  os << "]\n";
#endif
}
```

Running this locally prints `max sleep: 59ms` (it's consistently in the 40-60ms range) and plotting out the timings (with 0 markers) gives this:

![Image](https://github.com/user-attachments/assets/b140a05b-c36c-49c2-af2b-600fb012abe7)

I don't know if `chrono::milliseconds(128)` is meant to be in `microseconds` or if it should be `chrono::milliseconds(8)` to match the final backoff value, or if this is intentionally a large wait? I couldn't see anything on the PR that introduced this code https://reviews.llvm.org/D68480, though the "Show Older Changes" button doesn't seem to work so it might be hidden in there.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to