What about setsid and pushing it in a new
 seesion instead of using&  in the script?

:-) That works. Thanks!

NB, the working script looks like:

setsid bash -c "mpirun command>&  out"&
tail -f out

Thanks,
Pablo


On 23/04/11 18:39, Reuti wrote:
Am 23.04.2011 um 19:33 schrieb Ralph Castain:

On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:

I'm not sure what you are actually trying to accomplish
I simply want a script that runs the equivalent of:

mpirun command>&   out&
tail -f out

such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly 
do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for 
me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.
What about setsid and pushing it in a new seesion instead of using&  in the 
script?

-- Reuti


At least when I test it, even non-mpirun processes will abort.

it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.

I need mpirun to either ignore the SIGINT or not receive it at all -- and as 
per your comments, ignoring it is not an option.

Let me rephrase my question then. With the following script:

mpirun command>&   out&
tail -f out

SIGINT stops tail AND mpirun. That's OK. The following:

(
trap : SIGINT
mpirun command>&   out&
)
tail -f out

has the same effect, idicating that mpirun overrides previous traps in the same 
subshell. That's OK too. However the following:

(
trap : SIGINT
(
mpirun command>&   out&
)
)
tail -f out

also has the same effect. How is mpirun overriding the trap in the *parent* 
subshell so that it ends up getting the SIGINT that was supposedly blocked at 
that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun, 
and then run tail in a -separate- command. Now you can ctrl-c tail without 
mpirun seeing it.

But you are welcome to not believe me and continue thrashing... :-/

Thanks,
Pablo


On 23/04/11 16:27, Ralph Castain wrote:
On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:

Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint 
and terminates the job in order to ensure proper cleanup during abnormal 
terminations.

I'm not sure what you are actually trying to accomplish, but there are other 
signals that don't cause termination. For example, we trap and forward SIGUSR1 
and SIGUSR2 to your application procs, if that is of use.

But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore 
it.


Thanks,
Pablo



On 23/04/11 15:12, Ralph Castain wrote:
On Apr 23, 2011, at 6:20 AM, Reuti wrote:

Hi,

Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:

I'm having a bit of a problem with wrapping mpirun in a script. The script 
needs to run an MPI job in the background and tail -f the output. Pressing 
Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing 
in the script receives it. Mpirun traps the ctrl-c and immediately terminates 
all running MPI procs.


However mpirun seems to detect the SIGINT that was meant for tail, and kills 
the job immediately. I've tried workarounds involving nohup, disown, trap, 
subshells (including calling the script from within itself), etc, to no avail.

The problem is that this doesn't happen if I run the command directly instead, without mpirun. Attached is a script that 
reproduces the problem. It runs a simple counting script in the background which takes 10 seconds to run, and tails the output. If 
called with "nompi" as first argument, it will simply run bash -c "$SCRIPT">&    "$out"&, 
and with "mpi" it will do the same with 'mpirun -np 1' prepended. The output I get is:
what about:

( trap "" sigint; exec mpiexec ...)&

i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, 
maybe mpiexec is adjusting it on its own again. This can be checked in 
/proc/<pid>/status

-- Reuti

$ ./ompi_bug.sh mpi
mpi:
1
2
3
4
^C
$ ./ompi_bug.sh nompi
nompi:
1
2
3
4
^C
$ cat output.*
mpi:
1
2
3
4
mpirun: killing job...

--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on 
signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished

nompi:
1
2
3
4
5
6
7
8
9
10
Done


This convinces me that there is something strange with OpenMPI, since I expect 
no difference in signal handling when running a simple command with or without 
mpirun in the middle.

I've tried looking for options to change this behaviour, but I don't seem to 
find any. Is there one, preferably in the form of an environment variable? Or 
is this a bug?

I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as 
distributed with OpenSUSE 11.3.

Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to