> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp
> > Lines 85 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236984#file2236984line85>
> >
> >     Does the value of this map really need to be an `Owned`, rather than 
> > just a raw `Info`?
> 
> Qian Zhang wrote:
>     I think that's a convention in our isolator implementation, most of the 
> isolators use `Owned` rather than a raw `Info`, like:
>     
> https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/cgroups/cgroups.hpp#L169
>     
> https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/docker/volume/isolator.hpp#L117
>     
> https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/filesystem/linux.hpp#L99
>     
> https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/network/cni/cni.hpp#L223
> 
> Greg Mann wrote:
>     Any idea why? It seems unnecessary to me?

I am not quite sure, maybe it is a best practice to avoid using raw pointer in 
our code? I see we have some patches which replaced raw pointers with `Owned` 
in containerizer:
https://reviews.apache.org/r/51388/ 
https://reviews.apache.org/r/32233


> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 146-151 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line146>
> >
> >     Should we check `_volume.mode()` against the access mode contained 
> > within the `VolumeCapability`? Maybe we should have some separate 
> > validation code for that?
> 
> Qian Zhang wrote:
>     Yeah, I thought this before, but I am not quite sure if we should do 
> that. Actually in CSI spec, there is no any explanation about the interaction 
> between the `NodePublishVolumeRequest.readonly` and the access mode in the 
> `VolumeCapability`, like what if `NodePublishVolumeRequest.readonly == false` 
> but the access mode in the `VolumeCapability` is `SINGLE_NODE_READER_ONLY`? 
> Should we return a failure in this case?
> 
> Greg Mann wrote:
>     I'm not so worried about the `readonly` field, I don't think that we need 
> to validate it against the volume capabilities because mounting a volume as 
> read-only isn't "incompatible" with any of the capabilities (for example, I 
> don't think there's anything wrong with mounting a volume readonly if the 
> volume has only the SINGLE_NODE_WRITER capability, since mounting as readonly 
> does not violate RW permissions).
>     
>     However, I do think we should perform validation for our internal 
> `Volume::Mode` field. If a task attempts to include a CSI volume which has 
> only the SINGLE_NODE_READER_ONLY capability and the volume mode is set as RW, 
> that does not seem valid to me. WDYT?

> I don't think there's anything wrong with mounting a volume readonly if the 
> volume has only the SINGLE_NODE_WRITER capability, since mounting as readonly 
> does not violate RW permissions

But what about the reverse? I mean `NodePublishVolumeRequest.readonly == false` 
and the volume has only the `SINGLE_NODE_READER_ONLY` capability. Do you think 
it is wrong?

> However, I do think we should perform validation for our internal 
> Volume::Mode field. If a task attempts to include a CSI volume which has only 
> the SINGLE_NODE_READER_ONLY capability and the volume mode is set as RW, that 
> does not seem valid to me. WDYT?

Yes, I agree. I think I should add more validation here like: if the CSI volume 
has only the `SINGLE_NODE_READER_ONLY` or `MULTI_NODE_READER_ONLY` capability 
and our internal `Volume::Mode` field it set as `RW`, then we should return a 
failure. WDYT?


> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 179-184 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line179>
> >
> >     Can you help me understand why we do not require an absolute container 
> > path to already exist in this case, while we require it in other cases 
> > (both here and in the Linux filesystem isolator)?
> 
> Qian Zhang wrote:
>     The reason that we do not require an absolute container path to already 
> exist is that the container can be launched with any images, we cannot 
> require all images have the specified absolute container path to already 
> exist, and since the container is launched with an image (i.e. it has its own 
> rootfs), we can just create the absolute container path in its own rootfs 
> which will not affect the agent host rootfs. However if the container is 
> launched without an image (i.e. it has no its own rootfs and just uses the 
> agent host rootfs), we require the absolute container path to already exist 
> in the agent host rootfs and do not want to create the path via `os::mkdir()` 
> because we do not want to change the agent host rootfs.
>     
>     Could you please point me the code block of the Linux filesystem that you 
> mentioned above?
> 
> Greg Mann wrote:
>     > The reason that we do not require an absolute container path to already 
> exist is that the container can be launched with any images, we cannot 
> require all images have the specified absolute container path to already 
> exist, and since the container is launched with an image (i.e. it has its own 
> rootfs)
>     
>     Given the way our comments are written, I think it would be reasonable if 
> a task is launched with an image which specifies a volume with an absolute 
> container path, then they must specify an image in which that absolute path 
> already exists. I think it probably makes sense to have the same policy for 
> CSI volumes that we have for other types of volumes?
>     
>     In the Linux filesystem isolator, we skip volumes with absolute container 
> paths here: 
> https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L931-L939
>     
>     And I was unable to find another place in that isolator where we create 
> directories in rootfs's for volumes with absolute container paths.
>     
>     However, I see that in other volume isolators we do create the path 
> within the rootfs. Since that is the case, I might suggest that we leave your 
> code as-is and update our comments for the `container_path` field so that 
> they are correct?

Yeah, I think our comments for the `container_path` field is outdated which I 
guess was writted before we support launching container with an image.
```
message Volume {
  ..

  // Path pointing to a directory or file in the container. If the
  // path is a relative path, it is relative to the container work
  // directory. If the path is an absolute path, that path must
  // already exist.
  required string container_path = 1;
```

If the container is launched with an image, then we should not require an 
absolute container path to already exist. That's also what Docker does, e.g. 
when you run `docker run --rm -it -v /opt:/qzhang/data alpine sh`, Docker will 
automaically create the `/qzhang/data` inside the container which I think is 
the correct behavior.

I will update that comments for the `container_path` field, thanks for catching 
this!


> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 287-300 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line287>
> >
> >     Can we eliminate this if you use `collect()` above instead of `await()`?
> 
> Qian Zhang wrote:
>     I think we still need to use `await()` so if any CSI publish volume call 
> fails we can get the actual error message.
> 
> Greg Mann wrote:
>     `collect()` will surface the message of the first failure to occur: 
> https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/3rdparty/libprocess/include/process/collect.hpp#L172-L174
>     
>     If that's sufficient we can use `collect()`, but if you want to aggregate 
> the messages from multiple failures then you're right, `await()` is required.

Yeah, I think we need to aggregate the messages from multiple failures, that is 
also how we work with CNI plugins.


> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 316 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line316>
> >
> >     Why do we do a recursive bind mount here, but not for volumes in the 
> > Linux filesystem isolator? 
> > https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L1067-L1068
> 
> Qian Zhang wrote:
>     The reason that we do a recursive bind mount here is, if there is any 
> submounts under the `source`, we want all those submounts also bind mounted 
> at the `target` so that the container can access them. We have this logic in 
> all the volume isolators, like:
>     
> https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/volume/host_path.cpp#L351
>     
> https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/volume/image.cpp#L255
>     
>     The code that you pointed out in the Linux filesystem isolator is to 
> mount the PV, since PV is originally created by Mesos and we are sure there 
> is no submounts under it, so we do not need `MS_REC` in this case.
> 
> Greg Mann wrote:
>     Why do we know that a persistent volume will not have any mounts under 
> it, but we don't know this for CSI volumes? The PV is originally created by 
> Mesos, but applications can do whatever they want in the volume after it's 
> created.

A CSI volume can be a pre-provisioned volume so it can already have some 
submounts under it before users try to use it via Mesos. But you are right, 
applicaiton can also create submounts under a PV. Maybe we should fix it for PV 
in a separate ticket? WDYT?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72733/#review221469
-----------------------------------------------------------


On Aug. 6, 2020, 5:23 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72733/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2020, 5:23 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10153
>     https://issues.apache.org/jira/browse/MESOS-10153
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Implemented the `prepare` method of `volume/csi` isolator.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt 4e15e3d99aa2cce2403fe07e762fef2fb4a27dea 
>   src/Makefile.am 447db323875e4cad46000977f4a61600baff8f89 
>   src/common/validation.cpp 14a8c7b9d70c303b527e5e43012999471ced2d78 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp 
> PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp 
> PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/state.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/state.proto PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/72733/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>

Reply via email to