Nope - none of them will work with 1.4.2. Sorry - bug not discovered until after release
On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote: > Hi Jeff, > thanks for the quick reply. > > Would using '--cpus-per-proc N' in place of '-npernode N' or just '-bynode' > do the trick? > > It seems that using '--loadbalance' also crashes mpirun. > > best ... > > Michael > > > On 08/23/10 19:30, Jeff Squyres wrote: >> >> Yes, the -npernode segv is a known issue. >> >> We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl and >> see if that fixes your problem? >> >> http://www.open-mpi.org/nightly/v1.4/ >> >> >> >> On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote: >> >>> Hello OMPI: >>> >>> We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. OMPI >>> was built uisng Intel compilers 11.1.072. I am attaching the configuration >>> log and output from ompi_info -a. >>> >>> The problem we are encountering is that whenever we use option '-npernode >>> N' in the mpirun command line we get a segmentation fault as in below: >>> >>> >>> miket@login002[pts/7]PS $ mpirun -npernode 1 --display-devel-map >>> --tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname >>> >>> Map generated by mapping policy: 0402 >>> Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE >>> Num new daemons: 2 New daemon starting vpid 1 >>> Num nodes: 3 >>> >>> Data for node: Name: login001 Launch id: -1 Arch: 0 State: 2 >>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >>> Daemon: [[44812,0],1] Daemon launched: False >>> Num slots: 1 Slots in use: 2 >>> Num slots allocated: 1 Max slots: 0 >>> Username on node: NULL >>> Num procs: 1 Next node_rank: 1 >>> Data for proc: [[44812,1],0] >>> Pid: 0 Local rank: 0 Node rank: 0 >>> State: 0 App_context: 0 Slot list: NULL >>> >>> Data for node: Name: login002 Launch id: -1 Arch: ffc91200 >>> State: 2 >>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >>> Daemon: [[44812,0],0] Daemon launched: True >>> Num slots: 1 Slots in use: 2 >>> Num slots allocated: 1 Max slots: 0 >>> Username on node: NULL >>> Num procs: 1 Next node_rank: 1 >>> Data for proc: [[44812,1],0] >>> Pid: 0 Local rank: 0 Node rank: 0 >>> State: 0 App_context: 0 Slot list: NULL >>> >>> Data for node: Name: login003 Launch id: -1 Arch: 0 State: 2 >>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >>> Daemon: [[44812,0],2] Daemon launched: False >>> Num slots: 1 Slots in use: 2 >>> Num slots allocated: 1 Max slots: 0 >>> Username on node: NULL >>> Num procs: 1 Next node_rank: 1 >>> Data for proc: [[44812,1],0] >>> Pid: 0 Local rank: 0 Node rank: 0 >>> State: 0 App_context: 0 Slot list: NULL >>> [login002:02079] *** Process received signal *** >>> [login002:02079] Signal: Segmentation fault (11) >>> [login002:02079] Signal code: Address not mapped (1) >>> [login002:02079] Failing at address: 0x50 >>> [login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0] >>> [login002:02079] [ 1] >>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7) >>> [0x2afa70d25de7] >>> [login002:02079] [ 2] >>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8) >>> [0x2afa70d36088] >>> [login002:02079] [ 3] >>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7) >>> [0x2afa70d37fc7] >>> [login002:02079] [ 4] >>> /g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1] >>> [login002:02079] [ 5] mpirun [0x404c27] >>> [login002:02079] [ 6] mpirun [0x403e38] >>> [login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) >>> [0x3568e1d994] >>> [login002:02079] [ 8] mpirun [0x403d69] >>> [login002:02079] *** End of error message *** >>> Segmentation fault >>> >>> We tried version 1.4.1 and this problem did not emerge. >>> >>> This option is necessary for when our users launch hybrid MPI-OMP code were >>> they can request M nodes and n ppn in a PBS/Torque setup so they can only >>> get the right amount of MPI taks. Unfortunately, as soon as we use the >>> 'npernode N' option mprun crashes. >>> >>> Is this a known issue? I found related problem (of around May, 2010) when >>> people were using the same option but in a SLURM environment. >>> >>> regards >>> >>> Michael >>> >>> <config.log.gz><ompi_info-a.out.gz>_______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users