-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/75181/#review226862
-----------------------------------------------------------
Ship it!
simplifications:
```
diff --git a/src/linux/cgroups2.cpp b/src/linux/cgroups2.cpp
index 7ff7f7358..d1c415db8 100644
--- a/src/linux/cgroups2.cpp
+++ b/src/linux/cgroups2.cpp
@@ -399,70 +399,42 @@ Future<Nothing> destroy(const string& cgroup)
return Failure("Failed to kill processes in cgroup: " + kill.error());
}
- // Wait until all of the processes have been killed.
- // Try removing the cgroup directory until time runs out
+ // In order to reliably destroy a cgroup, one has to retry on EBUSY
+ // *even if* all the processes are no longer found in cgroup.procs.
+ // We retry for up to ~5 seconds, based on how crun destroys its
+ // cgroups:
+ //
+ // https://github.com/containers/crun/blob/10b3038c1398b7db20b1826f
+ // 94e9d4cb444e9568/src/libcrun/cgroup-utils.c#L471
int retries = 5000;
- Try<set<string>> cgroups = cgroups2::get(cgroup);
- if (cgroups.isError()) {
- return Failure("Failed to get nested cgroups: " + cgroups.error());
- }
-
- cgroups->insert(cgroup);
Future<Nothing> removal = loop(
[]() { return process::after(Milliseconds(1)); },
[=](const Nothing&) mutable -> Future<ControlFlow<Nothing>> {
+ Try<set<string>> cgroups = cgroups2::get(cgroup);
+ if (cgroups.isError()) {
+ return Failure("Failed to get nested cgroups: " + cgroups.error());
+ }
+ cgroups->insert(cgroup);
+
// Remove the cgroups in bottom-up order.
- bool could_not_remove = false;
- set<string> removed_cgroups;
foreach (const string& cgroup, adaptor::reverse(*cgroups)) {
const string path = cgroups2::path(cgroup);
// Remove the cgroup's directory. If the directory does not exist,
// ignore the error to protect against races.
- // In addition even if a cgroup's pids are empty the rmdir call on it
- // may still return EBUSY, causing us to fail.
- // We want to retry the rmdir operation even on EBUSY for up to
- // 5 seconds to ensure that we are able to delete the cgroup.
- //
- // This approach is similar to how crun is destroying its cgroups.
- // see:
https://github.com/containers/crun/blob/10b3038c1398b7db20b1826f94e9d4cb444e9568/src/libcrun/cgroup-utils.c#L471
if (::rmdir(path.c_str()) < 0) {
ErrnoError error = ErrnoError();
-
if (error.code == EBUSY) {
- LOG(WARNING) << "Failed to remove cgroup '" << cgroup << "' "
- << "due to EBUSY, retrying if possible";
-
- Try<set<pid_t>> pids = cgroups2::processes(cgroup, true);
- if (pids.isError()) {
- return Failure("Failed to fetch pids in cgroup: " +
pids.error());
- }
-
- if (pids->empty()) {
- LOG(WARNING) << "Failed to remove cgroup '" << cgroup << "' "
- << "despite containing no pids";
+ --retries;
+ if (retries == 0) {
+ return Failure("Failed to remove cgroup after 5000 attempts");
}
-
- could_not_remove = true;
- break;
+ return Continue();
} else if (error.code != ENOENT) {
return Failure(
"Failed to remove directory '" + path + "': " + error.message);
}
}
- removed_cgroups.insert(cgroup);
- }
-
- foreach(const string& cgroup, removed_cgroups) {
- cgroups->erase(cgroup);
- }
-
- if (could_not_remove) {
- --retries;
- if (retries == 0) {
- return Failure("Failed to remove cgroup");
- }
- return Continue();
}
return Break();
```
- Benjamin Mahler
On Aug. 16, 2024, 10:06 p.m., Jason Zhou wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/75181/
> -----------------------------------------------------------
>
> (Updated Aug. 16, 2024, 10:06 p.m.)
>
>
> Review request for mesos and Benjamin Mahler.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Currently we only wait until a cgroup's pids (retrieved from
> cgroup.procs) is empty. However even if a cgroup's pids are empty the
> rmdir call on it may still return EBUSY, causing us to fail the destroy
> operation. We want to retry the rmdir operation even on EBUSY for up to
> 5 seconds to ensure that we are able to delete the cgroup.
>
> This approach is similar to how crun is destroying its cgroups.
> see:
> https://github.com/containers/crun/blob/10b3038c1398b7db20b1826f94e9d4cb444e9568/src/libcrun/cgroup-utils.c#L471
>
>
> Diffs
> -----
>
> src/linux/cgroups2.cpp 9dd100aa6d224d67166cc80473423ea5fa815c14
>
>
> Diff: https://reviews.apache.org/r/75181/diff/1/
>
>
> Testing
> -------
>
> The test RemoveNonCheckpointingFramework was previously failing due to not
> being able to remove its cgroups. This test now passes
>
>
> Thanks,
>
> Jason Zhou
>
>