> On Sept. 24, 2016, 1:33 a.m., Jie Yu wrote: > > src/slave/containerizer/mesos/launch.cpp, line 140 > > <https://reviews.apache.org/r/51406/diff/7/?file=1509426#file1509426line140> > > > > Looks like sprintf is not async safe in glibc: > > > > http://stackoverflow.com/questions/14573000/print-int-from-signal-handler-using-write-or-async-safe-functions > > > > cerr << below is not async safe either. > > > > I think it's highly unlikely that we'll hit the async safe problem here > > because most of the time, our code will be in syscall. So I suggest we > > simply this function and add a TODO about async safety (we try to be async > > safe, but if it's too hard, use a simple way): > > > > ``` > > Try<Nothing> write = os::write( > > containerStatusFd.get(), > > std::to_string(status)); > > > > if (write.isError()) { > > os::write( > > STDERR_FILENO, > > "Failed to write container status '" + statusString + > > "': " + ::strerror(errno)); > > } > > ``` > > > > I'd not do the `_exit` here because this helper is meant to be writting > > the status. Leave the `_exit` to the caller. > > Kevin Klues wrote: > Yeah, that's my bad. I didn't look closely enough at `sprintf()` and > assumed it was async signal safe since it takes a pointer to the destination > where to write the result. I forgot that it does buffered I/O under the hood > though. Doh! > > The `cerr <<` was just an oversight. Obviously this is not async signal > safe! > > Regarding the exit... If we don't want to do an exit here, then the > function should return a `bool` so the caller can decide what to do.
Looks like all callers do not care. I'll keep it as void for now. - Jie ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51406/#review150266 ----------------------------------------------------------- On Sept. 23, 2016, 8:59 p.m., Kevin Klues wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51406/ > ----------------------------------------------------------- > > (Updated Sept. 23, 2016, 8:59 p.m.) > > > Review request for mesos and Jie Yu. > > > Bugs: MESOS-6088 > https://issues.apache.org/jira/browse/MESOS-6088 > > > Repository: mesos > > > Description > ------- > > Previously the 'mesos-containerizer launch' binary would simply exec > into the actual command we wanted to launch after doing some set of > preperatory work. The problem with this approach, however, is that > this gives us no opportunity to checkpoint the wait status of the > command so the agent can recover it in cases where it is offline at > the time the command completes. > > To compensate for this, we now allow 'mesos-containerizer launch' to > take an optional '--runtime_directory' argument, which indicates where > to checkpint the status of the launched command (as returned by > 'waitpid()'). In order to do this checkpointing, however, we cannot > simply exec into the command anymore. Instead we now fork-exec the > command and reap it once it completes. We then checkpoint its status > and return it as our own. We call the original process the 'init' > process of the container and the fork-exec'd process its child. As a > side effect of doing things this way, we also have to be careful to > forward all signals sent to the init process down to the child. > > In a subsequent commit we will update the mesos containerizer to use > this new functionality. > > > Diffs > ----- > > src/slave/containerizer/mesos/launch.hpp > 859990cb85e9e8c06400397256cfc512f0811800 > src/slave/containerizer/mesos/launch.cpp > 777121cffaa5fc76adfefc1e617409d997c7aac8 > > Diff: https://reviews.apache.org/r/51406/diff/ > > > Testing > ------- > > $ GTEST_FILTER="" make -j check > $ src/mesos-tests > $ sudo src/mesos-tests > > > Thanks, > > Kevin Klues > >
