----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50841/#review145507 -----------------------------------------------------------
src/slave/containerizer/docker.hpp (lines 215 - 218) <https://reviews.apache.org/r/50841/#comment211751> How about kill this and add two new functions as following: ``` process::Future<Nothing> deallocateNvidiaGpu( const ContainerID& containerId); process::Future<Nothing> allocateNvidiaGpu( size_t requestedNvidiaGpu, const ContainerID& containerId); ``` src/slave/containerizer/docker.cpp (lines 1325 - 1355) <https://reviews.apache.org/r/50841/#comment211747> What about adding a new function named as `DockerContainerizerProcess::allocateNvidiaGpu`. ``` Future<Nothing> DockerContainerizerProcess::allocateNvidiaGpu( size_t requestedNvidiaGpu, const ContainerID& containerId) { if (!containers_.contains(containerId)) { return Failure("Container is already destroyed"); } Container* container = containers_[containerId]; if (requestedNvidiaGpu <= 0) { return Nothing(); } return nvidiaGpuAllocator->allocate(requestedNvidiaGpu) .then(defer(self(), [=](set<Gpu> allocated) -> Future<Nothing> { foreach (const Gpu& gpu, allocated) { container->gpuAllocated.push_back(gpu); } return Nothing(); })); } ``` Then return followingn in the end of `DockerContainerizerProcess::launchExecutorProcess`: ``` const Resources& resources = taskInfo->resources(); Option<double> gpus = resources.gpus(); // Make sure that the `gpus` resource is not fractional. // We rely on scala resources only have 3 digits of precision. if (static_cast<long long>(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) { return Failure("The 'gpus' resource must be an unsigned integer"); } size_t requested = static_cast<size_t>(gpus.getOrElse(0.0)); return allocateNvidiaGpu(requested, containerId) .then(defer(self(), [=]() { return logger->prepare(container->executor, container->directory); })) .then(defer( self(), [=](const ContainerLogger::SubprocessInfo& subprocessInfo) -> Future<pid_t> { // NOTE: The child process will be blocked until all hooks have been // executed. vector<Subprocess::Hook> parentHooks; // NOTE: Currently we don't care about the order of the hooks, as // both hooks are independent. // A hook that is executed in the parent process. It attempts to checkpoint // the process pid. // // NOTE: // - The child process is blocked by the hook infrastructure while // these hooks are executed. // - It is safe to bind `this`, as hooks are executed immediately // in a `subprocess` call. // - If `checkpoiont` returns an Error, the child process will be killed. parentHooks.emplace_back(Subprocess::Hook(lambda::bind( &DockerContainerizerProcess::checkpoint, this, containerId, lambda::_1))); #ifdef __linux__ // If we are on systemd, then extend the life of the executor. Any // grandchildren's lives will also be extended. if (systemd::enabled()) { parentHooks.emplace_back(Subprocess::Hook( &systemd::mesos::extendLifetime)); } #endif // __linux__ // Prepare the flags to pass to the mesos docker executor process. docker::Flags launchFlags = dockerFlags( flags, container->name(), container->directory, container->taskEnvironment); VLOG(1) << "Launching 'mesos-docker-executor' with flags '" << launchFlags << "'"; // Construct the mesos-docker-executor using the "name" we gave the // container (to distinguish it from Docker containers not created // by Mesos). Try<Subprocess> s = subprocess( path::join(flags.launcher_dir, "mesos-docker-executor"), argv, Subprocess::PIPE(), subprocessInfo.out, subprocessInfo.err, SETSID, launchFlags, environment, None(), parentHooks, container->directory); if (s.isError()) { return Failure("Failed to fork executor: " + s.error()); } return s.get().pid(); })); } ``` src/slave/containerizer/docker.cpp (lines 1544 - 1559) <https://reviews.apache.org/r/50841/#comment211748> What about abstract this to a new function so that this can also be used by `DockerContainerizerProcess::destroy`. ``` Future<Nothing> DockerContainerizerProcess::deallocateNvidiaGpu( const ContainerID& containerId) { set<Gpu> deallocated; if (!containers_[containerId]->gpuAllocated.empty()) { foreach (const Gpu& gpu, containers_[containerId]->gpuAllocated) { deallocated.insert(gpu); } containers_[containerId]->gpuAllocated.clear(); if (nvidiaGpuAllocator.isSome()) { nvidiaGpuAllocator->deallocate(deallocated) .onFailed(defer(self(), [=](const string& message) -> Future<Nothing> { LOG(WARNING) << "Deallocate GPUs error for container '" << containerId << "': " << message; return Nothing(); })); } } return Nothing(); } ``` Then here can be updated to ``` if (container.pid.isNone()) { return deallocateNvidiaGpu(containerId); } ``` src/slave/containerizer/docker.cpp (lines 1909 - 1928) <https://reviews.apache.org/r/50841/#comment211749> Can we do the gpu clean up at the end of this function? src/slave/containerizer/docker.cpp (lines 2060 - 2061) <https://reviews.apache.org/r/50841/#comment211750> If clean up gpu here, then the logic could be: ``` // Otherwise, wait for Docker::run to succeed, in which case we'll // continue in _destroy (calling Docker::kill) or for Docker::run to // fail, in which case we'll re-execute this function and cleanup // above. deallocateNvidiaGpu(containerId) .then(defer(self(), [=]() { return container->status.future(); })) .onAny(defer(self(), &Self::_destroy, containerId, killed)); ``` - Guangya Liu On 八月 10, 2016, 10:34 a.m., Yubo Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50841/ > ----------------------------------------------------------- > > (Updated 八月 10, 2016, 10:34 a.m.) > > > Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and > Rajat Phull. > > > Bugs: MESOS-5795 > https://issues.apache.org/jira/browse/MESOS-5795 > > > Repository: mesos > > > Description > ------- > > Added control logic to allocate/deallocate GPUs to GPU-related task > when the task is started/terminated. > > > Diffs > ----- > > src/slave/containerizer/docker.hpp 43ca4317d608b3b43dd7bd0d1b55c721e7364885 > src/slave/containerizer/docker.cpp 12bad2db03bcf755317c654f028b628c5c407a62 > > Diff: https://reviews.apache.org/r/50841/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Yubo Li > >
