Re: [Rd] Is it possible to gracefully interrupt a child R process on MS Windows?

Henrik Bengtsson Mon, 12 May 2025 12:52:56 -0700

Thanks all.

I can confirm that 'GenerateConsoleCtrlEvent', which Kevin and Ivan
referred to, works. I have verified that ps::ps_interrupt() can
trigger an interrupt of an Rscript process running in the background,
which then R detects as a user-interrupt (think Ctrl-C) and signals an
'interrupt' condition, which can be caught using calling handlers. If
`tools::pskill(pid, signal = SIGINT)` could implement this approach, I
think we end up with a consistent behavior of SIGINT across operating
systems in base R.

Regarding SIGINT for gracefully shutting down R processes:

I think SIGINT is an obvious candidate for providing running processes
a "preemption warning", before killing the process. That way the
affected process gets a chance to exit gracefully, e.g. cleaning up
files, closing connections, etc. In R, we already have most of this
infrastructure in place, e.g. R_CheckUserInterrupt() and signalling of
the 'interrupt' condition.

Although one of my immediate uses cases is parallel R clusters, I'm
after a general solution here. There are plenty of use cases where one
would like to give an R process a chance to exit gracefully on
request. This has work also when the R process is busy in the middle
of some computation, which why I think SIGINT is the best candidate.

> Using signals to terminate a process even on Unix may not be seen as
graceful enough, either. It is not just a Windows problem.

I agree with this, but it takes us a long way. It is already a
de-facto standard to use POSIX signals this way in high-performance
compute (HPC) environments. HPC schedulers give running jobs
"preemption warnings" via POSIX signals. For instance, when a job is
approaching its maximum run-time, the scheduler will let the job know
in advance that it is about to be terminated. For example, Grid Engine
signals SIGUSR2 twice (immediately after each other), and 60 seconds
later SIGKILL is signaled. Similarly, Slurm sends a SIGTERM and then a
SIGKILL after 30 seconds. The exact signals and grace periods can be
configured globally, but I think also by the user at time of
submission for some schedulers. Users can of course also customize it
by having tools and shell traps that translate one signal to another,
e.g. main job script receives a SIGUSR2 and resignals it as a SIGINT.

> I think for reliability and portability of termination, one needs to 
> implement an application-specific termination protocol on both ends.

Maybe I misunderstand your proposal, but I don't think a custom
termination protocol can meet the same needs as a handling interrupts.
If it would be an API, then all parts of the R software stack need to
be aware of that API, e.g. polling it frequently to check if there is
a request to be shutdown. This is where I think R_CheckUserInterrupt()
excels.

> Yes, TerminateProcess() on Windows will not allow the target process to run 
> any cleanup.

Thanks for confirming - I wasn't 100% sure.

> But I think one should avoid using pskill()/signals for termination and 
> instead use an application-level termination protocol. The parallel package, 
> PSOCK, has one, based on socket connections, so perhaps one can take some 
> inspiration from there.

We might be talking about different use cases. If we take a parallel
cluster, the preferred way to shut it down is parallel::stopCluster().
However, that will only take place when all workers are done with
their current tasks and are ready to receive new commands from the
parent process, including the "please-shut-yourself-down" command.
That can take minutes, hours, days, or even never happen if the worker
is stuck in an infinite loop. A harsher approach would be to just
close the socket connection on the parent's end, but even that won't
shut down the worker until it attempts to use that connection. So, we
need a way to interrupt the worker also when the worker is busy. This
will free up compute (CPU and memory) resources sooner. This might be
based on an active decision of the user (Ctrl-C in the parent, or
explicit function call), but also automatically if we know the results
from the workers are no longer of value. The system might also signal
such a shutdown, e.g. HPC scheduler or host rebooting. We can handle
this already now using base R, but on Windows we cannot terminate a
worker gracefully from R, because we cannot send a signal that
resembles SIGINT.

Thanks,

Henrik

On Mon, May 12, 2025 at 12:57 AM Tomas Kalibera
<tomas.kalib...@gmail.com> wrote:
>
> I think for reliability and portability of termination, one needs to
> implement an application-specific termination protocol on both ends.
> Only within specific application constraints, one can also define what
> graceful termination means. Typically, one also has other expectations
> from the termination process - such that the processes will terminate in
> some finite time/soon. In some cases one also may require certain
> behavior of the cleanup code (such as that wouldn't take long, wouldn't
> do some things, etc), to meet the specific termination requirements. And
> it may require some behavior of the non-cleanup code as well (such as
> polling in some intervals).
>
> Using signals to terminate a process even on Unix may not be seen as
> graceful enough, either. It is not just a Windows problem.
>
> Yes, TerminateProcess() on Windows will not allow the target process to
> run any cleanup. The documentation of "pskill" names
> "TerminateProcess()" explicitly so that the readers interested in the
> details can follow Microsoft documentation. But I think one should avoid
> using pskill()/signals for termination and instead use an
> application-level termination protocol. The parallel package, PSOCK, has
> one, based on socket connections, so perhaps one can take some
> inspiration from there.
>
> Best
> Tomas
>
> On 5/11/25 19:58, Henrik Bengtsson wrote:
> > In help("pskill", package = "tools") is says:
> >
> >    Only SIGINT and SIGTERM will be defined on Windows, and pskill will
> > always use the Windows system call TerminateProcess.
> >
> > As far as I understand it, TerminateProcess [1] terminates the process
> > "quite abruptly". Specifically, it is not possible for the process to
> > intercept the termination and gracefully shutdown. In R terms, we
> > cannot rely on:
> >
> > tryCatch({
> >    ...
> > }, interrupt = function(int) {
> >    ## cleanup
> > })
> >
> > Similarly, it does not look like R itself can exit gracefully. For
> > example, when signalling pskill(pid, signal = SIGINT) to another R
> > process, that R process leaves behind its tempdir(). In contrast, if
> > the user interrupts the process interactively (Ctrl-C), there is an
> > 'interrupt' condition that can be caught, and R cleans up after itself
> > before exiting.
> >
> > QUESTION:
> >
> > Is it possible to gracefully interrupt a child R process on MS
> > Windows, e.g. a PSOCK cluster node? (I don't think so, but I figure
> > it's worth asking)
> >
> >
> > SUGGESTIONS:
> >
> > Also, if my understanding that TerminateProcess is abrupt is correct,
> > and there is no way to exit gracefully, would it make sense to clarify
> > this fact in help("pskill", package = "tools")? Right now you either
> > have to know how 'TerminateProcess' works, or run various tests on MS
> > Windows to figure out the current behavior.
> >
> > Also, would a better signal mapping be:
> >
> >    Only SIGKILL will be defined on Windows, and pskill will always use
> > the Windows system call TerminateProcess. Signals SIGINT and SIGTERM
> > are supported for backward compatible reasons, but are effectively
> > identical to SIGKILL.
> >
> > ? That would change the expectations on what will happen for people
> > coming from the POSIX world.
> >
> > [1] 
> > https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminateprocess
> >
> > /Henrik
> >
> > ______________________________________________
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Is it possible to gracefully interrupt a child R process on MS Windows?

Reply via email to