-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/75074/#review226759
-----------------------------------------------------------




src/slave/containerizer/mesos/isolators/gpu/isolator.hpp
Lines 87 (patched)
<https://reviews.apache.org/r/75074/#comment315015>

    use a ref here
    
    btw this is no longer an Owned since we're definitely holding it in 
multiple places, but it's a shared pointer so this works



src/slave/containerizer/mesos/isolators/gpu/isolator.hpp
Lines 119 (patched)
<https://reviews.apache.org/r/75074/#comment315016>

    ditto here, use a ref



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 95 (patched)
<https://reviews.apache.org/r/75074/#comment315017>

    use a ref here rather than taking a copy
    
    s/gpuToNonWildcard/asDeviceEntry/



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 107 (patched)
<https://reviews.apache.org/r/75074/#comment315018>

    nit: two newlines between top level definitions



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 177 (patched)
<https://reviews.apache.org/r/75074/#comment315019>

    maybe return Error on error here same as if cgroups::hierarchy fails?



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Line 159 (original), 179 (patched)
<https://reviews.apache.org/r/75074/#comment315020>

    what does hierarchy == none here mean?



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Line 244 (original), 266 (patched)
<https://reviews.apache.org/r/75074/#comment315021>

    do you need a TODO here?



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Line 252 (original), 274 (patched)
<https://reviews.apache.org/r/75074/#comment315022>

    nit: two newlines between top level definitions



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 276 (patched)
<https://reviews.apache.org/r/75074/#comment315023>

    nit: open parens wrap at 4 spaces



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 284-285 (patched)
<https://reviews.apache.org/r/75074/#comment315024>

    assigning to itself..?
    
    why not use your entry creation helper here? should we make an overload of 
is_access_granted for non-whitelist entry?



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 294-298 (patched)
<https://reviews.apache.org/r/75074/#comment315025>

    nit: use the same indentation as the cgroups v1 logic below here



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 285-311 (original), 345-370 (patched)
<https://reviews.apache.org/r/75074/#comment315026>

    this logic looks the same, can you not re-use it?
    
    the difference is how to check if access is granted?
    
    v1: if gpu in devices.list
    v2: if gpu in DeviceManager::CgroupDeviceAccess
    
    so couldn't you have something like:
    
    ```
    // if v2:
    return deviceManager->state(cgroup)
      .then(defer(self(), [=](const DeviceManager::CgroupDeviceAccess&) {
        // compute set<Gpu> containerGpus
        return containerGpus;
      })
      .then(deviceManager->state(cgroup).then(defer(
            PID<NvidiaGpuIsolatorProcess>(this),
            &NvidiaGpuIsolatorProcess::_recover,
            containerId,
            available,
            lambda::_1));
      
    // if v1:
    // compute set<Gpu> containerGpus
    return _recover(containerId, available, containerGpus));
    ```



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 411 (patched)
<https://reviews.apache.org/r/75074/#comment315027>

    nit: indentation



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 412-440 (patched)
<https://reviews.apache.org/r/75074/#comment315028>

    can't all this code just be a simple loop to convert to the needed format 
via a helper?
    
    ```
    vector<DeviceManager::NonWildcardEntry> entries;
    foreachvalue (const Entry& entry, controlDeviceEntries) {
      entries.push_back(toNonWildcardEntry(entry));
    }
    ```



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Line 351 (original), 454 (patched)
<https://reviews.apache.org/r/75074/#comment315029>

    // Cgroups v1:



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 629-635 (patched)
<https://reviews.apache.org/r/75074/#comment315031>

    let's keep the same naming to avoid confusion, so:
    
    deallocated
    deallocated_entries (rather than gpus_for_deallocation)



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Line 525 (original), 644 (patched)
<https://reviews.apache.org/r/75074/#comment315030>

    // Cgroups v1:



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 683 (patched)
<https://reviews.apache.org/r/75074/#comment315035>

    nit: add newline



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 684 (patched)
<https://reviews.apache.org/r/75074/#comment315034>

    ditto naming, maybe just allocation_entries



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 685 (patched)
<https://reviews.apache.org/r/75074/#comment315033>

    nit: remove extra newline



src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Line 565 (original), 698 (patched)
<https://reviews.apache.org/r/75074/#comment315032>

    // Cgroups v1:


- Benjamin Mahler


On Aug. 1, 2024, 7:42 p.m., Jason Zhou wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/75074/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2024, 7:42 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently, the GPU isolator assumes we are only using cgroups v1, and
> makes use of the cgroups::devices::allow and deny functions to control
> GPU access.
> 
> In Cgroups2, we need to attach ebpf programs for the specific cgroups,
> which is done for us in the DeviceManager. Hence, we need to use the
> DeviceManager in the GPU isolator depending on whether cgroups v1 or v2
> is currently mounted
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 9bafb137ba67bfed5cfb24c1a3af9203fde9e821 
>   src/slave/containerizer/mesos/isolators/gpu/isolator.hpp 
> e4f221d5f63ed1db044acabbbd427a30a0f69ced 
>   src/slave/containerizer/mesos/isolators/gpu/isolator.cpp 
> 99119f938e2eb5ca6a8b64d073c87ca5032a00b8 
> 
> 
> Diff: https://reviews.apache.org/r/75074/diff/3/
> 
> 
> Testing
> -------
> 
> Existing GPU isolator tests pass, with the exception of the 
> DefaultExecutorVerifyDeviceAccess test which is because we currently don't 
> support nested containers.
> 
> 
> Thanks,
> 
> Jason Zhou
> 
>

Reply via email to