> On 八月 11, 2016, 3:07 p.m., Guangya Liu wrote: > > src/slave/containerizer/docker.cpp, lines 1325-1355 > > <https://reviews.apache.org/r/50841/diff/3/?file=1469743#file1469743line1325> > > > > What about adding a new function named as > > `DockerContainerizerProcess::allocateNvidiaGpu`. > > > > ``` > > Future<Nothing> DockerContainerizerProcess::allocateNvidiaGpu( > > size_t requestedNvidiaGpu, > > const ContainerID& containerId) > > { > > if (!containers_.contains(containerId)) { > > return Failure("Container is already destroyed"); > > } > > > > Container* container = containers_[containerId]; > > > > if (requestedNvidiaGpu <= 0) { > > return Nothing(); > > } > > > > return nvidiaGpuAllocator->allocate(requestedNvidiaGpu) > > .then(defer(self(), [=](set<Gpu> allocated) -> Future<Nothing> { > > foreach (const Gpu& gpu, allocated) { > > container->gpuAllocated.push_back(gpu); > > } > > > > return Nothing(); > > })); > > } > > ``` > > > > Then return followingn in the end of > > `DockerContainerizerProcess::launchExecutorProcess`: > > > > ``` > > const Resources& resources = taskInfo->resources(); > > > > Option<double> gpus = resources.gpus(); > > > > // Make sure that the `gpus` resource is not fractional. > > // We rely on scala resources only have 3 digits of precision. > > if (static_cast<long long>(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) { > > return Failure("The 'gpus' resource must be an unsigned integer"); > > } > > > > size_t requested = static_cast<size_t>(gpus.getOrElse(0.0)); > > > > return allocateNvidiaGpu(requested, containerId) > > .then(defer(self(), [=]() { > > return logger->prepare(container->executor, container->directory); > > })) > > .then(defer( > > self(), > > [=](const ContainerLogger::SubprocessInfo& subprocessInfo) > > -> Future<pid_t> { > > // NOTE: The child process will be blocked until all hooks have been > > // executed. > > vector<Subprocess::Hook> parentHooks; > > > > // NOTE: Currently we don't care about the order of the hooks, as > > // both hooks are independent. > > > > // A hook that is executed in the parent process. It attempts to > > checkpoint > > // the process pid. > > // > > // NOTE: > > // - The child process is blocked by the hook infrastructure while > > // these hooks are executed. > > // - It is safe to bind `this`, as hooks are executed immediately > > // in a `subprocess` call. > > // - If `checkpoiont` returns an Error, the child process will be > > killed. > > parentHooks.emplace_back(Subprocess::Hook(lambda::bind( > > &DockerContainerizerProcess::checkpoint, > > this, > > containerId, > > lambda::_1))); > > > > #ifdef __linux__ > > // If we are on systemd, then extend the life of the executor. Any > > // grandchildren's lives will also be extended. > > if (systemd::enabled()) { > > parentHooks.emplace_back(Subprocess::Hook( > > &systemd::mesos::extendLifetime)); > > } > > #endif // __linux__ > > > > // Prepare the flags to pass to the mesos docker executor process. > > docker::Flags launchFlags = dockerFlags( > > flags, > > container->name(), > > container->directory, > > container->taskEnvironment); > > > > VLOG(1) << "Launching 'mesos-docker-executor' with flags '" > > << launchFlags << "'"; > > > > // Construct the mesos-docker-executor using the "name" we gave the > > // container (to distinguish it from Docker containers not created > > // by Mesos). > > Try<Subprocess> s = subprocess( > > path::join(flags.launcher_dir, "mesos-docker-executor"), > > argv, > > Subprocess::PIPE(), > > subprocessInfo.out, > > subprocessInfo.err, > > SETSID, > > launchFlags, > > environment, > > None(), > > parentHooks, > > container->directory); > > > > if (s.isError()) { > > return Failure("Failed to fork executor: " + s.error()); > > } > > > > return s.get().pid(); > > })); > > } > > > > ``` > > Yubo Li wrote: > I saw you removed the check: > if (!nvidiaGpuAllocator.isSome()) { > return Failure("Can not deallocate GPU without enabling GPU support."); > } > > If someone compiled mesos without GPU support, `nvidiaGpuAllocator` > should be `None()`. In `allocateNvidiaGpu()`, > `nvidiaGpuAllocator->allocate(requestedNvidiaGpu)` will crash. Your opinion?
Yes, we should have such logic in both `allocateNvidiaGpu` and `deallocateNvidiaGpu`. Thanks. - Guangya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50841/#review145507 ----------------------------------------------------------- On 八月 10, 2016, 10:34 a.m., Yubo Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50841/ > ----------------------------------------------------------- > > (Updated 八月 10, 2016, 10:34 a.m.) > > > Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and > Rajat Phull. > > > Bugs: MESOS-5795 > https://issues.apache.org/jira/browse/MESOS-5795 > > > Repository: mesos > > > Description > ------- > > Added control logic to allocate/deallocate GPUs to GPU-related task > when the task is started/terminated. > > > Diffs > ----- > > src/slave/containerizer/docker.hpp 43ca4317d608b3b43dd7bd0d1b55c721e7364885 > src/slave/containerizer/docker.cpp 12bad2db03bcf755317c654f028b628c5c407a62 > > Diff: https://reviews.apache.org/r/50841/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Yubo Li > >
