> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote: > > src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp > > Lines 85 (patched) > > <https://reviews.apache.org/r/72733/diff/1/?file=2236984#file2236984line85> > > > > Does the value of this map really need to be an `Owned`, rather than > > just a raw `Info`? > > Qian Zhang wrote: > I think that's a convention in our isolator implementation, most of the > isolators use `Owned` rather than a raw `Info`, like: > > https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/cgroups/cgroups.hpp#L169 > > https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/docker/volume/isolator.hpp#L117 > > https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/filesystem/linux.hpp#L99 > > https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/network/cni/cni.hpp#L223 > > Greg Mann wrote: > Any idea why? It seems unnecessary to me?
I am not quite sure, maybe it is a best practice to avoid using raw pointer in our code? I see we have some patches which replaced raw pointers with `Owned` in containerizer: https://reviews.apache.org/r/51388/ https://reviews.apache.org/r/32233 > On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote: > > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp > > Lines 146-151 (patched) > > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line146> > > > > Should we check `_volume.mode()` against the access mode contained > > within the `VolumeCapability`? Maybe we should have some separate > > validation code for that? > > Qian Zhang wrote: > Yeah, I thought this before, but I am not quite sure if we should do > that. Actually in CSI spec, there is no any explanation about the interaction > between the `NodePublishVolumeRequest.readonly` and the access mode in the > `VolumeCapability`, like what if `NodePublishVolumeRequest.readonly == false` > but the access mode in the `VolumeCapability` is `SINGLE_NODE_READER_ONLY`? > Should we return a failure in this case? > > Greg Mann wrote: > I'm not so worried about the `readonly` field, I don't think that we need > to validate it against the volume capabilities because mounting a volume as > read-only isn't "incompatible" with any of the capabilities (for example, I > don't think there's anything wrong with mounting a volume readonly if the > volume has only the SINGLE_NODE_WRITER capability, since mounting as readonly > does not violate RW permissions). > > However, I do think we should perform validation for our internal > `Volume::Mode` field. If a task attempts to include a CSI volume which has > only the SINGLE_NODE_READER_ONLY capability and the volume mode is set as RW, > that does not seem valid to me. WDYT? > I don't think there's anything wrong with mounting a volume readonly if the > volume has only the SINGLE_NODE_WRITER capability, since mounting as readonly > does not violate RW permissions But what about the reverse? I mean `NodePublishVolumeRequest.readonly == false` and the volume has only the `SINGLE_NODE_READER_ONLY` capability. Do you think it is wrong? > However, I do think we should perform validation for our internal > Volume::Mode field. If a task attempts to include a CSI volume which has only > the SINGLE_NODE_READER_ONLY capability and the volume mode is set as RW, that > does not seem valid to me. WDYT? Yes, I agree. I think I should add more validation here like: if the CSI volume has only the `SINGLE_NODE_READER_ONLY` or `MULTI_NODE_READER_ONLY` capability and our internal `Volume::Mode` field it set as `RW`, then we should return a failure. WDYT? > On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote: > > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp > > Lines 179-184 (patched) > > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line179> > > > > Can you help me understand why we do not require an absolute container > > path to already exist in this case, while we require it in other cases > > (both here and in the Linux filesystem isolator)? > > Qian Zhang wrote: > The reason that we do not require an absolute container path to already > exist is that the container can be launched with any images, we cannot > require all images have the specified absolute container path to already > exist, and since the container is launched with an image (i.e. it has its own > rootfs), we can just create the absolute container path in its own rootfs > which will not affect the agent host rootfs. However if the container is > launched without an image (i.e. it has no its own rootfs and just uses the > agent host rootfs), we require the absolute container path to already exist > in the agent host rootfs and do not want to create the path via `os::mkdir()` > because we do not want to change the agent host rootfs. > > Could you please point me the code block of the Linux filesystem that you > mentioned above? > > Greg Mann wrote: > > The reason that we do not require an absolute container path to already > exist is that the container can be launched with any images, we cannot > require all images have the specified absolute container path to already > exist, and since the container is launched with an image (i.e. it has its own > rootfs) > > Given the way our comments are written, I think it would be reasonable if > a task is launched with an image which specifies a volume with an absolute > container path, then they must specify an image in which that absolute path > already exists. I think it probably makes sense to have the same policy for > CSI volumes that we have for other types of volumes? > > In the Linux filesystem isolator, we skip volumes with absolute container > paths here: > https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L931-L939 > > And I was unable to find another place in that isolator where we create > directories in rootfs's for volumes with absolute container paths. > > However, I see that in other volume isolators we do create the path > within the rootfs. Since that is the case, I might suggest that we leave your > code as-is and update our comments for the `container_path` field so that > they are correct? Yeah, I think our comments for the `container_path` field is outdated which I guess was writted before we support launching container with an image. ``` message Volume { .. // Path pointing to a directory or file in the container. If the // path is a relative path, it is relative to the container work // directory. If the path is an absolute path, that path must // already exist. required string container_path = 1; ``` If the container is launched with an image, then we should not require an absolute container path to already exist. That's also what Docker does, e.g. when you run `docker run --rm -it -v /opt:/qzhang/data alpine sh`, Docker will automaically create the `/qzhang/data` inside the container which I think is the correct behavior. I will update that comments for the `container_path` field, thanks for catching this! > On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote: > > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp > > Lines 287-300 (patched) > > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line287> > > > > Can we eliminate this if you use `collect()` above instead of `await()`? > > Qian Zhang wrote: > I think we still need to use `await()` so if any CSI publish volume call > fails we can get the actual error message. > > Greg Mann wrote: > `collect()` will surface the message of the first failure to occur: > https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/3rdparty/libprocess/include/process/collect.hpp#L172-L174 > > If that's sufficient we can use `collect()`, but if you want to aggregate > the messages from multiple failures then you're right, `await()` is required. Yeah, I think we need to aggregate the messages from multiple failures, that is also how we work with CNI plugins. > On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote: > > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp > > Lines 316 (patched) > > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line316> > > > > Why do we do a recursive bind mount here, but not for volumes in the > > Linux filesystem isolator? > > https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L1067-L1068 > > Qian Zhang wrote: > The reason that we do a recursive bind mount here is, if there is any > submounts under the `source`, we want all those submounts also bind mounted > at the `target` so that the container can access them. We have this logic in > all the volume isolators, like: > > https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/volume/host_path.cpp#L351 > > https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/volume/image.cpp#L255 > > The code that you pointed out in the Linux filesystem isolator is to > mount the PV, since PV is originally created by Mesos and we are sure there > is no submounts under it, so we do not need `MS_REC` in this case. > > Greg Mann wrote: > Why do we know that a persistent volume will not have any mounts under > it, but we don't know this for CSI volumes? The PV is originally created by > Mesos, but applications can do whatever they want in the volume after it's > created. A CSI volume can be a pre-provisioned volume so it can already have some submounts under it before users try to use it via Mesos. But you are right, applicaiton can also create submounts under a PV. Maybe we should fix it for PV in a separate ticket? WDYT? - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72733/#review221469 ----------------------------------------------------------- On Aug. 6, 2020, 5:23 p.m., Qian Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72733/ > ----------------------------------------------------------- > > (Updated Aug. 6, 2020, 5:23 p.m.) > > > Review request for mesos, Andrei Budnik and Greg Mann. > > > Bugs: MESOS-10153 > https://issues.apache.org/jira/browse/MESOS-10153 > > > Repository: mesos > > > Description > ------- > > Implemented the `prepare` method of `volume/csi` isolator. > > > Diffs > ----- > > src/CMakeLists.txt 4e15e3d99aa2cce2403fe07e762fef2fb4a27dea > src/Makefile.am 447db323875e4cad46000977f4a61600baff8f89 > src/common/validation.cpp 14a8c7b9d70c303b527e5e43012999471ced2d78 > src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp > PRE-CREATION > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp > PRE-CREATION > src/slave/containerizer/mesos/isolators/volume/csi/state.hpp PRE-CREATION > src/slave/containerizer/mesos/isolators/volume/csi/state.proto PRE-CREATION > > > Diff: https://reviews.apache.org/r/72733/diff/2/ > > > Testing > ------- > > > Thanks, > > Qian Zhang > >
