> On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.hpp, line 506 > > <https://reviews.apache.org/r/50841/diff/7/?file=1486671#file1486671line506> > > > > Shoud we use `set` here like we do in the mesos containerizer?
Yes, changed it to `set`. > On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.cpp, lines 1341-1346 > > <https://reviews.apache.org/r/50841/diff/7/?file=1486672#file1486672line1341> > > > > You can just use `if (gpus.get() > 0)` here. the temp variable `requested` is removed. > On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.cpp, lines 1541-1552 > > <https://reviews.apache.org/r/50841/diff/7/?file=1486672#file1486672line1541> > > > > You can't rely on `containers_[containerId]` to be valid at this point > > (hence the check for it below). > > > > Regardless, I just at the head of master to see where the most > > appropriate place to do this deallocation would be, and this function looks > > *completely* different from what I see here. > > > > When was the last time you rebased? The code is rebased at Aug. 15, let's keep this issue. I'll come back to this after I fix all other bugs and rebase by code. Code rebased, where is the most appropriate place to do so? BTW, `deallocateNvidiaGpus()` will check if `containers_[containerId]` is valid now, so we don't need to worry that `containers_[containerId]` is invalid. > On 八月 28, 2016, 12:57 a.m., Kevin Klues wrote: > > src/slave/containerizer/docker.cpp, lines 2037-2041 > > <https://reviews.apache.org/r/50841/diff/7/?file=1486672#file1486672line2037> > > > > It feels too early to be doing the `deallocate()` here. I think we > > should only do it once the container is actually killed (i.e. in > > `___destroy()`). > > > > If we do it too early, the GPUs could still be being used by the > > container when they get allocated out to someone else. This would not go > > very well. > > > > There is the case, however, in `__destroy()` where we aren't able to > > kill the docker container at all! It's better to leak the GPUs in this case > > than to deallocate them and run the risk of giving them to a new container. > > We should mention this in the error string below if any GPUs are actually > > leaked: > > > > ``` > > container->termination.fail( > > "Failed to kill the Docker container: " + > > (kill.isFailed() ? kill.failure() : "discarded future")); > > ``` I see. Also will come back to deal with deallocate issues after code rebased. I moved this to `___destroy()`. Also, added the error string for leaked GPUs: ``` container->termination.fail( "Failed to kill the Docker container: " + (kill.isFailed() ? kill.failure() : "discarded future") + (container->gpus.empty() ? "" : ", " + stringify(container->gpus.size()) + "GPUs leaked")); ``` - Yubo ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50841/#review147089 ----------------------------------------------------------- On 九月 20, 2016, 9:22 a.m., Yubo Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50841/ > ----------------------------------------------------------- > > (Updated 九月 20, 2016, 9:22 a.m.) > > > Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and > Rajat Phull. > > > Bugs: MESOS-5795 > https://issues.apache.org/jira/browse/MESOS-5795 > > > Repository: mesos > > > Description > ------- > > Added control logic to allocate/deallocate GPUs to GPU-related task > when the task is started/terminated. > > > Diffs > ----- > > src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d > src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 > > Diff: https://reviews.apache.org/r/50841/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Yubo Li > >
