-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51406/#review146791
-----------------------------------------------------------



Can you split the Launcher::wait support to a separate patch?


src/slave/containerizer/mesos/launch.cpp (line 124)
<https://reviews.apache.org/r/51406/#comment213501>

    We use the same binary for nested container as well. So it might not be an 
executor. I'd rename it to `containerPid`



src/slave/containerizer/mesos/launch.cpp (lines 130 - 142)
<https://reviews.apache.org/r/51406/#comment213502>

    indentation here should be 2 :)



src/slave/containerizer/mesos/launch.cpp (lines 130 - 131)
<https://reviews.apache.org/r/51406/#comment213516>

    You can do the following:
    ```
    bufferedSignals[sig].fetch_add(1)
    ```



src/slave/containerizer/mesos/launch.cpp (line 132)
<https://reviews.apache.org/r/51406/#comment213513>

    defer is not async singal safe because it'll try to acquire a lock.
    
    os::kill is async signal safe
    http://man7.org/linux/man-pages/man7/signal.7.html
    
    Looking at possible failure scenarios of os::kill, none of them looks 
possible except ESRCH. But in that case, the container pid is gone, and the 
process will exit soon. So no need to print the error message.
    
    Therefore, i'd suggest we simply call os::kill in the signal handler and 
ignore the error. No need to use defer.



src/slave/containerizer/mesos/launch.cpp (line 133)
<https://reviews.apache.org/r/51406/#comment213514>

    What if bufferedSignals[i] > 1?



src/slave/containerizer/mesos/launch.cpp (lines 161 - 172)
<https://reviews.apache.org/r/51406/#comment213503>

    We have os::signal::install in stout. Please use that



src/slave/containerizer/mesos/launch.cpp (lines 220 - 225)
<https://reviews.apache.org/r/51406/#comment213472>

    I'd suggest move this whole block of code after we receive the control pipe 
signal from the parent to continue.
    
    If parent dies, there no point to open and create the exit_status file.



src/slave/containerizer/mesos/launch.cpp (line 224)
<https://reviews.apache.org/r/51406/#comment213475>

    even if we pivot_root below.



src/slave/containerizer/mesos/launch.cpp (lines 229 - 239)
<https://reviews.apache.org/r/51406/#comment213467>

    I suggest that we create the directory in `Launcher::fork` because it also 
needs to checkpoint the pid there.
    
    This helper binary should just open the file at `exit_status_path`.



src/slave/containerizer/mesos/launch.cpp (line 244)
<https://reviews.apache.org/r/51406/#comment213473>

    can you make sure fd has cloexec flag, we don't want to leak this fd to the 
subprocess.



src/slave/containerizer/mesos/launch.cpp (line 245)
<https://reviews.apache.org/r/51406/#comment213468>

    I'd insert a new line above.



src/slave/containerizer/mesos/launch.cpp (lines 493 - 497)
<https://reviews.apache.org/r/51406/#comment213490>

    I'd suggest we restructure the code like the following. Since we cannot do 
signal forwarding on windows, i'd suggest we simply does not support this 
feature on Windows. This flag will only be used in LinuxLauncher.
    
    ```
    #ifndef __WINDOWS__
    if (exitStatusFd.isSome()) {
      pid_t pid = fork();
      if (pid < 0) {
        cerr << ... << endl;
        return EXIT_FAILURE;
      } else if (pid > 0) {
        // Parent.
        forwardSingals(pid);
        
        ...
      }
    }
    #endif // __WINDOWS__
    
    // Child.
    if (command->shell()) {
      os::execlp(...);
    } else {
      execvp(...);
    }
    ```



src/slave/containerizer/mesos/launch.cpp (line 502)
<https://reviews.apache.org/r/51406/#comment213477>

    I'd avoid using subprocess here. Using subprocess here means that we'll 
initialize libprocess, creating many unnecessary threads. I would simply 
fork-exec here. We don't have two worry about async signal safe problem here 
because it's single threaded process.
    
    We need to fix the `pre_exec_comands` above for the same problem.



src/slave/containerizer/mesos/launch.cpp (lines 542 - 544)
<https://reviews.apache.org/r/51406/#comment213493>

    else if



src/slave/containerizer/mesos/launch.cpp (line 550)
<https://reviews.apache.org/r/51406/#comment213495>

    kill one extra line



src/slave/containerizer/mesos/launch.cpp (line 562)
<https://reviews.apache.org/r/51406/#comment213497>

    better to print the exit status path as well here


- Jie Yu


On Aug. 25, 2016, 1:21 a.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51406/
> -----------------------------------------------------------
> 
> (Updated Aug. 25, 2016, 1:21 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6088
>     https://issues.apache.org/jira/browse/MESOS-6088
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously the 'mesos-containerizer launch' binary would simply exec
> into the actual command we wanted to launch after doing some set of
> preperatory work. The problem with this approach, however, is that
> this gives us no opportunity to checkpoint the exit status of the
> command so the agent can recover it in cases where it is offline at
> the time the command completes.
> 
> To compensate for this, we now allow 'mesos-containerizer launch' to
> take an optional '--exit_status_path' parameter, which indicates where
> the exit status of our command should be checkpointed. In order to
> do this checkpointing, however, we cannot simply exec into the command
> anymore. Instead we now fork-exec the command and reap it once it
> completes. We then checkpoint its exit status and return it as our
> own. We call the original process the 'init' process of the container
> and the fork-exec'd process its child.  As a side effect of doing
> things this way, we also have to be careful to forward all signals
> sent to the init process down to the child.
> 
> In order to properly reap the 'init' process, we also introduce a new
> 'Launcher::wait' command that knows how to reap the 'init' process
> directly and then retreive its exit status from the checkpoint (if
> necessary).
> 
> In a subsequent commit we will update the mesos containerizer to use
> this new functionality.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/launch.hpp 
> 0e86da9c7bd9c7fbedd7102d66b902d1c10e5e0b 
>   src/slave/containerizer/mesos/launch.cpp 
> 2db8db5ac2f4614d406b950a1b4e8098a0d90e7b 
>   src/slave/containerizer/mesos/launcher.hpp 
> 0eae09515d550a2c71ae1414d4a22943f1d09db9 
>   src/slave/containerizer/mesos/launcher.cpp 
> 413a8afdc56127a58c9599c436d17d6c98e62434 
>   src/slave/containerizer/mesos/linux_launcher.hpp 
> 8fbe1e9742df85b0fc6de46ac81a0c064c845a63 
>   src/slave/containerizer/mesos/linux_launcher.cpp 
> 7377316776646e3d660086da3c0d5b494ce8ace4 
>   src/tests/containerizer/launcher.hpp 
> 94c62b761a17436841bd19f3bf622cc8f1047876 
>   src/tests/containerizer/launcher.cpp 
> a92d9890f0931425d69ef8ce0896d081b8722079 
> 
> Diff: https://reviews.apache.org/r/51406/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>

Reply via email to