Trolling through some really old messages that never got replies... :-(

The behavior that you are seeing is happening as the result of a really long
discussion among the OMPI developers when we were writing the TCP device.
The problem is that there is ambiguity when connecting peers across TCP in
Open MPI.  Specifically, since OMPI can span multiple TCP networks, each MPI
process may be able to use multiple IP addresses to each to each other MPI
process (and vice versa).  So we have to try to figure out which IP
addresses can speak to which others.

For example, say that you have a cluster with 16 nodes on a private ethernet
network.  One of these nodes doubles as the head node for the cluster and
therefore has 2 ethernet NICs -- one to the external network and one to the
internal cluster network.  But since 16 is a nice number, you also want to
use it for computation as well.  So when you mpirun spanning all 16 nodes,
OMPI has to figure out to *not* use the external NIC on the head node and
only use the internal NIC.

TCP connections are only made upon demand which is why you only see this
behavior if two processes actually attempt to communicate via MPI (i.e.,
"hello world" with no sending/receiving works fine, but adding the
MPI_SEND/MPI_RECV makes it fail).

We make connections by having all MPI processes exchange their IP
address(es) and port number(s) during MPI_INIT (via a common rendevouz
point, typically mpirun).  Then, whenever a connection is requested between
two processes, we apply a small set of rules to all pair combinations of IP
addresses of those processes:

1. If the two IP addresses match after the subnet mask is applied, assume
that they are mutually routable and allow the connection
2. If the two IP addresses are public, assume that they are mutually
routable and allow the connection
3. Otherwise, the connection is disallowed (this is not an error -- we just
disallow this connection on the hope that some other device can be used to
make a connection).

What is happening in your case is that you're falling through to #3 for all
IP address pair combinations and there is no other device that can reach
these processes.  Therefore OMPI thinks that it has no channel to reach the
remote process.  So it bails (in a horribly non-descriptive way :-( ).

We actually have a very long comment about this in the TCP code and mention
that your scenario (lots of hosts in a single cluster with private addresses
and relatively narrow subnet masks, even though all addresses are, in fact,
routable to each other) is not currently supported -- and that we need to do
something "better".  "Better" in this case probably means having a
configuration file that specifies what hosts are mutually routable when the
above rules don't work.  Do you have any suggestions on this front?



On 7/5/06 1:15 PM, "Frank Kahle" <openmpi-u...@fraka-mp.de> wrote:

> users-requ...@open-mpi.org wrote:
>> A few clarifying questions:
>> 
>> What is your netmask on these hosts?
>> 
>> Where is the MPI_ALLREDUCE in your app -- right away, or somewhere deep
>> within the application?  Can you replicate this with a simple MPI
>> application that essentially calls MPI_INIT, MPI_ALLREDUCE, and
>> MPI_FINALIZE?
>> 
>> Can you replicate this with a simple MPI app that does an MPI_SEND /
>> MPI_RECV between two processes on the different subnets?
>> 
>> Thanks.
>> 
>>   
> 
> @ Jeff,
> 
> netmask 255.255.255.0
> 
> Running a simple "hello world" yields no error on each subnet, but
> running "hello world" on both subnets yields the error
> 
> [g5dual.3-net:00436] *** An error occurred in MPI_Send
> [g5dual.3-net:00436] *** on communicator MPI_COMM_WORLD
> [g5dual.3-net:00436] *** MPI_ERR_INTERN: internal error
> [g5dual.3-net:00436] *** MPI_ERRORS_ARE_FATAL (goodbye)
> 
> Hope this helps!
> 
> Frank
> 
> 
> Just in case you wanna check the source:
> c    Fortran example hello_world
>       program hello
>       include 'mpif.h'
>       integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
>       character*12 message
> 
>       call MPI_INIT(ierror)
>       call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
>       call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
>       tag = 100
> 
>       if (rank .eq. 0) then
>         message = 'Hello, world'
>         do i=1, size-1
>           call MPI_SEND(message, 12, MPI_CHARACTER, i, tag,
>      &                  MPI_COMM_WORLD, ierror)
>         enddo
> 
>       else
>         call MPI_RECV(message, 12, MPI_CHARACTER, 0, tag,
>      &                MPI_COMM_WORLD, status, ierror)
>       endif
> 
>       print*, 'node', rank, ':', message
>       call MPI_FINALIZE(ierror)
>       end
> 
> 
> or the full output:
> 
> [powerbook:/Network/CFD/hello] motte% mpirun -d -np 5 --hostfile
> ./hostfile /Network/CFD/hello/hello_world
> [powerbook.2-net:00606] [0,0,0] setting up session dir with
> [powerbook.2-net:00606]         universe default-universe
> [powerbook.2-net:00606]         user motte
> [powerbook.2-net:00606]         host powerbook.2-net
> [powerbook.2-net:00606]         jobid 0
> [powerbook.2-net:00606]         procid 0
> [powerbook.2-net:00606] procdir:
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe/0/0
> [powerbook.2-net:00606] jobdir:
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe/0
> [powerbook.2-net:00606] unidir:
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe
> [powerbook.2-net:00606] top: openmpi-sessions-motte@powerbook.2-net_0
> [powerbook.2-net:00606] tmp: /tmp
> [powerbook.2-net:00606] [0,0,0] contact_file
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe/universe-setup.
> txt
> [powerbook.2-net:00606] [0,0,0] wrote setup file
> [powerbook.2-net:00606] pls:rsh: local csh: 1, local bash: 0
> [powerbook.2-net:00606] pls:rsh: assuming same remote shell as local shell
> [powerbook.2-net:00606] pls:rsh: remote csh: 1, remote bash: 0
> [powerbook.2-net:00606] pls:rsh: final template argv:
> [powerbook.2-net:00606] pls:rsh:     /usr/bin/ssh <template> orted
> --debug --bootproxy 1 --name <template> --num_procs 6 --vpid_start 0
> --nodename <template> --universe motte@powerbook.2-net:default-universe
> --nsreplica "0.0.0;tcp://192.168.2.3:49443" --gprreplica
> "0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
> [powerbook.2-net:00606] pls:rsh: launching on node Powerbook.2-net
> [powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [powerbook.2-net:00606] pls:rsh: Powerbook.2-net is a LOCAL node
> [powerbook.2-net:00606] pls:rsh: changing to directory /Users/motte
> [powerbook.2-net:00606] pls:rsh: executing: orted --debug --bootproxy 1
> --name 0.0.1 --num_procs 6 --vpid_start 0 --nodename Powerbook.2-net
> --universe motte@powerbook.2-net:default-universe --nsreplica
> "0.0.0;tcp://192.168.2.3:49443" --gprreplica
> "0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
> [powerbook.2-net:00607] [0,0,1] setting up session dir with
> [powerbook.2-net:00607]         universe default-universe
> [powerbook.2-net:00607]         user motte
> [powerbook.2-net:00607]         host Powerbook.2-net
> [powerbook.2-net:00607]         jobid 0
> [powerbook.2-net:00607]         procid 1
> [powerbook.2-net:00607] procdir:
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe/0/1
> [powerbook.2-net:00607] jobdir:
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe/0
> [powerbook.2-net:00607] unidir:
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe
> [powerbook.2-net:00607] top: openmpi-sessions-motte@Powerbook.2-net_0
> [powerbook.2-net:00607] tmp: /tmp
> [powerbook.2-net:00606] pls:rsh: launching on node g4d003.3-net
> [powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [powerbook.2-net:00606] pls:rsh: g4d003.3-net is a REMOTE node
> [powerbook.2-net:00606] pls:rsh: executing: /usr/bin/ssh g4d003.3-net
> orted --debug --bootproxy 1 --name 0.0.2 --num_procs 6 --vpid_start 0
> --nodename g4d003.3-net --universe
> motte@powerbook.2-net:default-universe --nsreplica
> "0.0.0;tcp://192.168.2.3:49443" --gprreplica
> "0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
> [g4d003.3-net:00411] [0,0,2] setting up session dir with
> [g4d003.3-net:00411]    universe default-universe
> [g4d003.3-net:00411]    user motte
> [g4d003.3-net:00411]    host g4d003.3-net
> [g4d003.3-net:00411]    jobid 0
> [g4d003.3-net:00411]    procid 2
> [g4d003.3-net:00411] procdir:
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe/0/2
> [g4d003.3-net:00411] jobdir:
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe/0
> [g4d003.3-net:00411] unidir:
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe
> [g4d003.3-net:00411] top: openmpi-sessions-motte@g4d003.3-net_0
> [g4d003.3-net:00411] tmp: /tmp
> [powerbook.2-net:00606] pls:rsh: launching on node g4d002.3-net
> [powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [powerbook.2-net:00606] pls:rsh: g4d002.3-net is a REMOTE node
> [powerbook.2-net:00606] pls:rsh: executing: /usr/bin/ssh g4d002.3-net
> orted --debug --bootproxy 1 --name 0.0.3 --num_procs 6 --vpid_start 0
> --nodename g4d002.3-net --universe
> motte@powerbook.2-net:default-universe --nsreplica
> "0.0.0;tcp://192.168.2.3:49443" --gprreplica
> "0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
> [powerbook.2-net:00606] pls:rsh: launching on node g4d001.3-net
> [powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [powerbook.2-net:00606] pls:rsh: g4d001.3-net is a REMOTE node
> [powerbook.2-net:00606] pls:rsh: executing: /usr/bin/ssh g4d001.3-net
> orted --debug --bootproxy 1 --name 0.0.4 --num_procs 6 --vpid_start 0
> --nodename g4d001.3-net --universe
> motte@powerbook.2-net:default-universe --nsreplica
> "0.0.0;tcp://192.168.2.3:49443" --gprreplica
> "0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
> [powerbook.2-net:00606] pls:rsh: launching on node G5Dual.3-net
> [powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [powerbook.2-net:00606] pls:rsh: G5Dual.3-net is a REMOTE node
> [powerbook.2-net:00606] pls:rsh: executing: /usr/bin/ssh G5Dual.3-net
> orted --debug --bootproxy 1 --name 0.0.5 --num_procs 6 --vpid_start 0
> --nodename G5Dual.3-net --universe
> motte@powerbook.2-net:default-universe --nsreplica
> "0.0.0;tcp://192.168.2.3:49443" --gprreplica
> "0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
> [g4d001.3-net:00336] [0,0,4] setting up session dir with
> [g4d001.3-net:00336]    universe default-universe
> [g4d001.3-net:00336]    user motte
> [g4d001.3-net:00336]    host g4d001.3-net
> [g4d001.3-net:00336]    jobid 0
> [g4d001.3-net:00336]    procid 4
> [g4d001.3-net:00336] procdir:
> /tmp/openmpi-sessions-motte@g4d001.3-net_0/default-universe/0/4
> [g4d001.3-net:00336] jobdir:
> /tmp/openmpi-sessions-motte@g4d001.3-net_0/default-universe/0
> [g4d001.3-net:00336] unidir:
> /tmp/openmpi-sessions-motte@g4d001.3-net_0/default-universe
> [g4d001.3-net:00336] top: openmpi-sessions-motte@g4d001.3-net_0
> [g4d001.3-net:00336] tmp: /tmp
> [g4d002.3-net:00279] [0,0,3] setting up session dir with
> [g4d002.3-net:00279]    universe default-universe
> [g4d002.3-net:00279]    user motte
> [g4d002.3-net:00279]    host g4d002.3-net
> [g4d002.3-net:00279]    jobid 0
> [g4d002.3-net:00279]    procid 3
> [g4d002.3-net:00279] procdir:
> /tmp/openmpi-sessions-motte@g4d002.3-net_0/default-universe/0/3
> [g4d002.3-net:00279] jobdir:
> /tmp/openmpi-sessions-motte@g4d002.3-net_0/default-universe/0
> [g4d002.3-net:00279] unidir:
> /tmp/openmpi-sessions-motte@g4d002.3-net_0/default-universe
> [g4d002.3-net:00279] top: openmpi-sessions-motte@g4d002.3-net_0
> [g4d002.3-net:00279] tmp: /tmp
> [g5dual.3-net:00434] [0,0,5] setting up session dir with
> [g5dual.3-net:00434]    universe default-universe
> [g5dual.3-net:00434]    user motte
> [g5dual.3-net:00434]    host G5Dual.3-net
> [g5dual.3-net:00434]    jobid 0
> [g5dual.3-net:00434]    procid 5
> [g5dual.3-net:00434] procdir:
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe/0/5
> [g5dual.3-net:00434] jobdir:
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe/0
> [g5dual.3-net:00434] unidir:
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe
> [g5dual.3-net:00434] top: openmpi-sessions-motte@G5Dual.3-net_0
> [g5dual.3-net:00434] tmp: /tmp
> [powerbook.2-net:00613] [0,1,4] setting up session dir with
> [powerbook.2-net:00613]         universe default-universe
> [powerbook.2-net:00613]         user motte
> [powerbook.2-net:00613]         host Powerbook.2-net
> [powerbook.2-net:00613]         jobid 1
> [powerbook.2-net:00613]         procid 4
> [powerbook.2-net:00613] procdir:
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe/1/4
> [powerbook.2-net:00613] jobdir:
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe/1
> [powerbook.2-net:00613] unidir:
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe
> [powerbook.2-net:00613] top: openmpi-sessions-motte@Powerbook.2-net_0
> [powerbook.2-net:00613] tmp: /tmp
> [g5dual.3-net:00436] [0,1,0] setting up session dir with
> [g5dual.3-net:00436]    universe default-universe
> [g5dual.3-net:00436]    user motte
> [g5dual.3-net:00436]    host G5Dual.3-net
> [g5dual.3-net:00436]    jobid 1
> [g5dual.3-net:00436]    procid 0
> [g5dual.3-net:00436] procdir:
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe/1/0
> [g5dual.3-net:00436] jobdir:
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe/1
> [g5dual.3-net:00436] unidir:
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe
> [g5dual.3-net:00436] top: openmpi-sessions-motte@G5Dual.3-net_0
> [g5dual.3-net:00436] tmp: /tmp
> [g4d001.3-net:00338] [0,1,1] setting up session dir with
> [g4d001.3-net:00338]    universe default-universe
> [g4d001.3-net:00338]    user motte
> [g4d001.3-net:00338]    host g4d001.3-net
> [g4d001.3-net:00338]    jobid 1
> [g4d001.3-net:00338]    procid 1
> [g4d001.3-net:00338] procdir:
> /tmp/openmpi-sessions-motte@g4d001.3-net_0/default-universe/1/1
> [g4d001.3-net:00338] jobdir:
> /tmp/openmpi-sessions-motte@g4d001.3-net_0/default-universe/1
> [g4d001.3-net:00338] unidir:
> /tmp/openmpi-sessions-motte@g4d001.3-net_0/default-universe
> [g4d001.3-net:00338] top: openmpi-sessions-motte@g4d001.3-net_0
> [g4d001.3-net:00338] tmp: /tmp
> [g4d003.3-net:00413] [0,1,3] setting up session dir with
> [g4d003.3-net:00413]    universe default-universe
> [g4d003.3-net:00413]    user motte
> [g4d003.3-net:00413]    host g4d003.3-net
> [g4d003.3-net:00413]    jobid 1
> [g4d003.3-net:00413]    procid 3
> [g4d003.3-net:00413] procdir:
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe/1/3
> [g4d003.3-net:00413] jobdir:
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe/1
> [g4d003.3-net:00413] unidir:
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe
> [g4d003.3-net:00413] top: openmpi-sessions-motte@g4d003.3-net_0
> [g4d003.3-net:00413] tmp: /tmp
> [g4d002.3-net:00281] [0,1,2] setting up session dir with
> [g4d002.3-net:00281]    universe default-universe
> [g4d002.3-net:00281]    user motte
> [g4d002.3-net:00281]    host g4d002.3-net
> [g4d002.3-net:00281]    jobid 1
> [g4d002.3-net:00281]    procid 2
> [g4d002.3-net:00281] procdir:
> /tmp/openmpi-sessions-motte@g4d002.3-net_0/default-universe/1/2
> [g4d002.3-net:00281] jobdir:
> /tmp/openmpi-sessions-motte@g4d002.3-net_0/default-universe/1
> [g4d002.3-net:00281] unidir:
> /tmp/openmpi-sessions-motte@g4d002.3-net_0/default-universe
> [g4d002.3-net:00281] top: openmpi-sessions-motte@g4d002.3-net_0
> [g4d002.3-net:00281] tmp: /tmp
> [powerbook.2-net:00606] spawn: in job_state_callback(jobid = 1, state = 0x4)
> [powerbook.2-net:00606] Info: Setting up debugger process table for
> applications
>   MPIR_being_debugged = 0
>   MPIR_debug_gate = 0
>   MPIR_debug_state = 1
>   MPIR_acquired_pre_main = 0
>   MPIR_i_am_starter = 0
>   MPIR_proctable_size = 5
>   MPIR_proctable:
>     (i, host, exe, pid) = (0, G5Dual.3-net,
> /Network/CFD/hello/hello_world, 436)
>     (i, host, exe, pid) = (1, g4d001.3-net,
> /Network/CFD/hello/hello_world, 338)
>     (i, host, exe, pid) = (2, g4d002.3-net,
> /Network/CFD/hello/hello_world, 281)
>     (i, host, exe, pid) = (3, g4d003.3-net,
> /Network/CFD/hello/hello_world, 413)
>     (i, host, exe, pid) = (4, Powerbook.2-net,
> /Network/CFD/hello/hello_world, 613)
> [powerbook.2-net:00613] [0,1,4] ompi_mpi_init completed
> [g4d001.3-net:00338] [0,1,1] ompi_mpi_init completed
> [g5dual.3-net:00436] [0,1,0] ompi_mpi_init completed
> [g4d003.3-net:00413] [0,1,3] ompi_mpi_init completed
> [g4d002.3-net:00281] [0,1,2] ompi_mpi_init completed
>  node           1 :Hello, world
>  node           2 :Hello, world node           3 :Hello, world
> [g5dual.3-net:00436] *** An error occurred in MPI_Send
> 
> [g5dual.3-net:00436] *** on communicator MPI_COMM_WORLD
> [g5dual.3-net:00436] *** MPI_ERR_INTERN: internal error
> [g5dual.3-net:00436] *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: powerbook.2-net
> PID:  613
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g4d003.3-net
> PID:  413
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g5dual.3-net
> PID:  436
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g4d002.3-net
> PID:  281
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g4d001.3-net
> PID:  338
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> [g5dual.3-net:00434] sess_dir_finalize: found proc session dir empty -
> deleting
> [g5dual.3-net:00434] sess_dir_finalize: found job session dir empty -
> deleting
> [g5dual.3-net:00434] sess_dir_finalize: univ session dir not empty - leaving
> [powerbook.2-net:00607] orted: job_state_callback(jobid = 1, state =
> ORTE_PROC_STATE_ABORTED)
> [g5dual.3-net:00434] orted: job_state_callback(jobid = 1, state =
> ORTE_PROC_STATE_ABORTED)
> [g4d003.3-net:00411] orted: job_state_callback(jobid = 1, state =
> ORTE_PROC_STATE_ABORTED)
> [g4d001.3-net:00336] orted: job_state_callback(jobid = 1, state =
> ORTE_PROC_STATE_ABORTED)
> [g5dual.3-net:00434] sess_dir_finalize: job session dir not empty - leaving
> [g5dual.3-net:00434] sess_dir_finalize: found proc session dir empty -
> deleting
> [g5dual.3-net:00434] sess_dir_finalize: found job session dir empty -
> deleting
> [g5dual.3-net:00434] sess_dir_finalize: found univ session dir empty -
> deleting
> [g5dual.3-net:00434] sess_dir_finalize: found top session dir empty -
> deleting
> [g4d002.3-net:00279] orted: job_state_callback(jobid = 1, state =
> ORTE_PROC_STATE_ABORTED)
> [g4d002.3-net:00279] sess_dir_finalize: found job session dir empty -
> deleting
> [g4d002.3-net:00279] sess_dir_finalize: univ session dir not empty - leaving
> [g4d002.3-net:00279] sess_dir_finalize: proc session dir not empty - leaving
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g4d002.3-net
> PID:  281
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g4d002.3-net
> PID:  281
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> [g4d002.3-net:00279] sess_dir_finalize: found proc session dir empty -
> deleting
> [g4d002.3-net:00279] sess_dir_finalize: found job session dir empty -
> deleting
> [g4d002.3-net:00279] sess_dir_finalize: found univ session dir empty -
> deleting
> [g4d002.3-net:00279] sess_dir_finalize: found top session dir empty -
> deleting
> [powerbook.2-net:00607] sess_dir_finalize: found job session dir empty -
> deleting
> [powerbook.2-net:00607] sess_dir_finalize: univ session dir not empty -
> leaving
> [powerbook.2-net:00607] sess_dir_finalize: proc session dir not empty -
> leaving
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: powerbook.2-net
> PID:  613
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: powerbook.2-net
> PID:  613
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> [powerbook.2-net:00607] sess_dir_finalize: found proc session dir empty
> - deleting
> [powerbook.2-net:00607] sess_dir_finalize: job session dir not empty -
> leaving
> [g4d001.3-net:00336] sess_dir_finalize: found job session dir empty -
> deleting
> [g4d001.3-net:00336] sess_dir_finalize: univ session dir not empty - leaving
> [g4d001.3-net:00336] sess_dir_finalize: proc session dir not empty - leaving
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g4d001.3-net
> PID:  338
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g4d001.3-net
> PID:  338
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> [g4d001.3-net:00336] sess_dir_finalize: found proc session dir empty -
> deleting
> [g4d001.3-net:00336] sess_dir_finalize: found job session dir empty -
> deleting
> [g4d001.3-net:00336] sess_dir_finalize: found univ session dir empty -
> deleting
> [g4d001.3-net:00336] sess_dir_finalize: found top session dir empty -
> deleting
> [g4d003.3-net:00411] sess_dir_finalize: found job session dir empty -
> deleting
> [g4d003.3-net:00411] sess_dir_finalize: univ session dir not empty - leaving
> [g4d003.3-net:00411] sess_dir_finalize: proc session dir not empty - leaving
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g4d003.3-net
> PID:  413
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> WARNING: A process refused to die!
> 
> Host: g4d003.3-net
> PID:  413
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------------------
> 1 process killed (possibly by Open MPI)
> [g4d003.3-net:00411] orted: job_state_callback(jobid = 1, state =
> ORTE_PROC_STATE_TERMINATED)
> [g4d003.3-net:00411] sess_dir_finalize: found proc session dir empty -
> deleting
> [g4d003.3-net:00411] sess_dir_finalize: found job session dir empty -
> deleting
> [g4d003.3-net:00411] sess_dir_finalize: found univ session dir empty -
> deleting
> [g4d003.3-net:00411] sess_dir_finalize: found top session dir empty -
> deleting
> [powerbook:/Network/CFD/hello] motte%
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Reply via email to