> On Aug. 7, 2024, 12:44 a.m., Benjamin Mahler wrote: > > src/slave/containerizer/device_manager/device_manager.cpp > > Lines 241-244 (patched) > > <https://reviews.apache.org/r/75145/diff/2/?file=2292495#file2292495line241> > > > > isn't this already covered by the Result being none?
No. if the file does not exist, protobuf::read will return an error when it tried to get a file descriptor. We don't really want to return Error if the file does not exist, so we handle this case separately. > On Aug. 7, 2024, 12:44 a.m., Benjamin Mahler wrote: > > src/slave/containerizer/device_manager/device_manager.cpp > > Lines 321-324 (patched) > > <https://reviews.apache.org/r/75145/diff/2/?file=2292495#file2292495line321> > > > > can you not leverage the commit function that was already added? > > > > if you don't want to re-checkpoint, then maybe we should have the > > caller decide when to: > > > > checkpoint() > > reconfigureEbfPrograms() > > > > and here we only do the latter? Sure, we can add arguments on the commit_device_access_changes function for the caller to determine if they want to checkpoint, reconfigure ebpf programs, or both? > On Aug. 7, 2024, 12:44 a.m., Benjamin Mahler wrote: > > src/slave/containerizer/device_manager/device_manager.cpp > > Lines 344-349 (patched) > > <https://reviews.apache.org/r/75145/diff/2/?file=2292495#file2292495line344> > > > > hm.. this case is really a complete failure? I suppose we can just log warnings here and return Nothing() afterwards... similar to how we return Nothing if the state file was not found? > On Aug. 7, 2024, 12:44 a.m., Benjamin Mahler wrote: > > src/slave/containerizer/device_manager/device_manager.cpp > > Line 232 (original), 360 (patched) > > <https://reviews.apache.org/r/75145/diff/2/?file=2292495#file2292495line360> > > > > put this in the earlier patch? good callout, somehow this slipped through. moved to prior patch - Jason ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/75145/#review226779 ----------------------------------------------------------- On Aug. 6, 2024, 8:11 p.m., Jason Zhou wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/75145/ > ----------------------------------------------------------- > > (Updated Aug. 6, 2024, 8:11 p.m.) > > > Review request for mesos and Benjamin Mahler. > > > Repository: mesos > > > Description > ------- > > We currently do not have any method of recovering the device access > states when the cgroups 2 isolator is atempting to recover containers. > > We add a recovery state here that makes use of the protobuf checkpoint > files to ensure that the previous device accesses of cgroups can be > restored. It will be used by the cgroups 2 isolator. > > > Diffs > ----- > > src/slave/containerizer/device_manager/device_manager.hpp > 853350f70e12b992ef311a35c509a5dce8f2301a > src/slave/containerizer/device_manager/device_manager.cpp > e613323dc47a7980984426d37b6fc5cfc52dffe0 > src/tests/device_manager_tests.cpp c4e9b8c58282b8d57b5ce88fefddd34c8ea30c77 > > > Diff: https://reviews.apache.org/r/75145/diff/2/ > > > Testing > ------- > > Test added to check the functionality of recover() with container state. > Tests pass, and the state was recovered. > > > Thanks, > > Jason Zhou > >
