-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/75181/#review226862
-----------------------------------------------------------


Ship it!




simplifications:

```
diff --git a/src/linux/cgroups2.cpp b/src/linux/cgroups2.cpp
index 7ff7f7358..d1c415db8 100644
--- a/src/linux/cgroups2.cpp
+++ b/src/linux/cgroups2.cpp
@@ -399,70 +399,42 @@ Future<Nothing> destroy(const string& cgroup)
     return Failure("Failed to kill processes in cgroup: " + kill.error());
   }
 
-  // Wait until all of the processes have been killed.
-  // Try removing the cgroup directory until time runs out
+  // In order to reliably destroy a cgroup, one has to retry on EBUSY
+  // *even if* all the processes are no longer found in cgroup.procs.
+  // We retry for up to ~5 seconds, based on how crun destroys its
+  // cgroups:
+  //
+  // https://github.com/containers/crun/blob/10b3038c1398b7db20b1826f
+  // 94e9d4cb444e9568/src/libcrun/cgroup-utils.c#L471
   int retries = 5000;
-  Try<set<string>> cgroups = cgroups2::get(cgroup);
-  if (cgroups.isError()) {
-    return Failure("Failed to get nested cgroups: " + cgroups.error());
-  }
-
-  cgroups->insert(cgroup);
   Future<Nothing> removal = loop(
     []() { return process::after(Milliseconds(1)); },
     [=](const Nothing&) mutable -> Future<ControlFlow<Nothing>> {
+      Try<set<string>> cgroups = cgroups2::get(cgroup);
+      if (cgroups.isError()) {
+        return Failure("Failed to get nested cgroups: " + cgroups.error());
+      }
+      cgroups->insert(cgroup);
+
       // Remove the cgroups in bottom-up order.
-      bool could_not_remove = false;
-      set<string> removed_cgroups;
       foreach (const string& cgroup, adaptor::reverse(*cgroups)) {
         const string path = cgroups2::path(cgroup);
 
         // Remove the cgroup's directory. If the directory does not exist,
         // ignore the error to protect against races.
-        // In addition even if a cgroup's pids are empty the rmdir call on it
-        // may still return EBUSY, causing us to fail.
-        //  We want to retry the rmdir operation even on EBUSY for up to
-        // 5 seconds to ensure that we are able to delete the cgroup.
-        //
-        // This approach is similar to how crun is destroying its cgroups.
-        // see: 
https://github.com/containers/crun/blob/10b3038c1398b7db20b1826f94e9d4cb444e9568/src/libcrun/cgroup-utils.c#L471
         if (::rmdir(path.c_str()) < 0) {
           ErrnoError error = ErrnoError();
-
           if (error.code == EBUSY) {
-            LOG(WARNING) << "Failed to remove cgroup '" << cgroup << "' "
-                         << "due to EBUSY, retrying if possible";
-
-            Try<set<pid_t>> pids = cgroups2::processes(cgroup, true);
-            if (pids.isError()) {
-              return Failure("Failed to fetch pids in cgroup: " + 
pids.error());
-            }
-
-            if (pids->empty()) {
-              LOG(WARNING) << "Failed to remove cgroup '" << cgroup << "' "
-                           << "despite containing no pids";
+            --retries;
+            if (retries == 0) {
+              return Failure("Failed to remove cgroup after 5000 attempts");
             }
-
-            could_not_remove = true;
-            break;
+            return Continue();
           } else if (error.code != ENOENT) {
             return Failure(
                 "Failed to remove directory '" + path + "': " + error.message);
           }
         }
-        removed_cgroups.insert(cgroup);
-      }
-
-      foreach(const string& cgroup, removed_cgroups) {
-        cgroups->erase(cgroup);
-      }
-
-      if (could_not_remove) {
-        --retries;
-        if (retries == 0) {
-          return Failure("Failed to remove cgroup");
-        }
-        return Continue();
       }
 
       return Break();

```

- Benjamin Mahler


On Aug. 16, 2024, 10:06 p.m., Jason Zhou wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/75181/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2024, 10:06 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently we only wait until a cgroup's pids (retrieved from
> cgroup.procs) is empty. However even if a cgroup's pids are empty the
> rmdir call on it may still return EBUSY, causing us to fail the destroy
> operation. We want to retry the rmdir operation even on EBUSY for up to
> 5 seconds to ensure that we are able to delete the cgroup.
> 
> This approach is similar to how crun is destroying its cgroups.
> see: 
> https://github.com/containers/crun/blob/10b3038c1398b7db20b1826f94e9d4cb444e9568/src/libcrun/cgroup-utils.c#L471
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups2.cpp 9dd100aa6d224d67166cc80473423ea5fa815c14 
> 
> 
> Diff: https://reviews.apache.org/r/75181/diff/1/
> 
> 
> Testing
> -------
> 
> The test RemoveNonCheckpointingFramework was previously failing due to not 
> being able to remove its cgroups. This test now passes
> 
> 
> Thanks,
> 
> Jason Zhou
> 
>

Reply via email to