Ummm....the configure log terminates normally, indicating it configured fine. The make log ends, but with no error shown - everything was building just fine.
Did you maybe stop it before it was complete? Run out of disk quota? Or...? On Aug 24, 2010, at 1:06 PM, Michael E. Thomadakis wrote: > Hi Ralph, > > I tried to build 1.4.3.a1r23542 (08/02/2010) with > > ./configure --prefix="/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2" > --enable-cxx-exceptions CFLAGS="-O2" CXXFLAGS="-O2" FFLAGS="-O2" > FCFLAGS="-O2" > with the GCC 4.1.2 > > miket@login002[pts/26]openmpi-1.4.3a1r23542 $ gcc -v > Using built-in specs. > Target: x86_64-redhat-linux > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man > --infodir=/usr/share/info --enable-shared --enable-threads=posix > --enable-checking=release --with-system-zlib --enable-__cxa_atexit > --disable-libunwind-exceptions --enable-libgcj-multifile > --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk > --disable-dssi --enable-plugin > --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic > --host=x86_64-redhat-linux > Thread model: posix > gcc version 4.1.2 20080704 (Red Hat 4.1.2-46) > > > but it failed. I am attaching the configure and make logs. > > regards > > Michael > > > On 08/23/10 20:53, Ralph Castain wrote: >> >> Nope - none of them will work with 1.4.2. Sorry - bug not discovered until >> after release >> >> On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote: >> >>> Hi Jeff, >>> thanks for the quick reply. >>> >>> Would using '--cpus-per-proc N' in place of '-npernode N' or just '-bynode' >>> do the trick? >>> >>> It seems that using '--loadbalance' also crashes mpirun. >>> >>> best ... >>> >>> Michael >>> >>> >>> On 08/23/10 19:30, Jeff Squyres wrote: >>>> >>>> Yes, the -npernode segv is a known issue. >>>> >>>> We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl >>>> and see if that fixes your problem? >>>> >>>> http://www.open-mpi.org/nightly/v1.4/ >>>> >>>> >>>> >>>> On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote: >>>> >>>>> Hello OMPI: >>>>> >>>>> We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. >>>>> OMPI was built uisng Intel compilers 11.1.072. I am attaching the >>>>> configuration log and output from ompi_info -a. >>>>> >>>>> The problem we are encountering is that whenever we use option '-npernode >>>>> N' in the mpirun command line we get a segmentation fault as in below: >>>>> >>>>> >>>>> miket@login002[pts/7]PS $ mpirun -npernode 1 --display-devel-map >>>>> --tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' >>>>> hostname >>>>> >>>>> Map generated by mapping policy: 0402 >>>>> Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE >>>>> Num new daemons: 2 New daemon starting vpid 1 >>>>> Num nodes: 3 >>>>> >>>>> Data for node: Name: login001 Launch id: -1 Arch: 0 State: 2 >>>>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >>>>> Daemon: [[44812,0],1] Daemon launched: False >>>>> Num slots: 1 Slots in use: 2 >>>>> Num slots allocated: 1 Max slots: 0 >>>>> Username on node: NULL >>>>> Num procs: 1 Next node_rank: 1 >>>>> Data for proc: [[44812,1],0] >>>>> Pid: 0 Local rank: 0 Node rank: 0 >>>>> State: 0 App_context: 0 Slot list: NULL >>>>> >>>>> Data for node: Name: login002 Launch id: -1 Arch: ffc91200 >>>>> State: 2 >>>>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >>>>> Daemon: [[44812,0],0] Daemon launched: True >>>>> Num slots: 1 Slots in use: 2 >>>>> Num slots allocated: 1 Max slots: 0 >>>>> Username on node: NULL >>>>> Num procs: 1 Next node_rank: 1 >>>>> Data for proc: [[44812,1],0] >>>>> Pid: 0 Local rank: 0 Node rank: 0 >>>>> State: 0 App_context: 0 Slot list: NULL >>>>> >>>>> Data for node: Name: login003 Launch id: -1 Arch: 0 State: 2 >>>>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >>>>> Daemon: [[44812,0],2] Daemon launched: False >>>>> Num slots: 1 Slots in use: 2 >>>>> Num slots allocated: 1 Max slots: 0 >>>>> Username on node: NULL >>>>> Num procs: 1 Next node_rank: 1 >>>>> Data for proc: [[44812,1],0] >>>>> Pid: 0 Local rank: 0 Node rank: 0 >>>>> State: 0 App_context: 0 Slot list: NULL >>>>> [login002:02079] *** Process received signal *** >>>>> [login002:02079] Signal: Segmentation fault (11) >>>>> [login002:02079] Signal code: Address not mapped (1) >>>>> [login002:02079] Failing at address: 0x50 >>>>> [login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0] >>>>> [login002:02079] [ 1] >>>>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7) >>>>> [0x2afa70d25de7] >>>>> [login002:02079] [ 2] >>>>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8) >>>>> [0x2afa70d36088] >>>>> [login002:02079] [ 3] >>>>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7) >>>>> [0x2afa70d37fc7] >>>>> [login002:02079] [ 4] >>>>> /g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so >>>>> [0x2afa721085a1] >>>>> [login002:02079] [ 5] mpirun [0x404c27] >>>>> [login002:02079] [ 6] mpirun [0x403e38] >>>>> [login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) >>>>> [0x3568e1d994] >>>>> [login002:02079] [ 8] mpirun [0x403d69] >>>>> [login002:02079] *** End of error message *** >>>>> Segmentation fault >>>>> >>>>> We tried version 1.4.1 and this problem did not emerge. >>>>> >>>>> This option is necessary for when our users launch hybrid MPI-OMP code >>>>> were they can request M nodes and n ppn in a PBS/Torque setup so they can >>>>> only get the right amount of MPI taks. Unfortunately, as soon as we use >>>>> the 'npernode N' option mprun crashes. >>>>> >>>>> Is this a known issue? I found related problem (of around May, 2010) >>>>> when people were using the same option but in a SLURM environment. >>>>> >>>>> regards >>>>> >>>>> Michael >>>>> >>>>> <config.log.gz><ompi_info-a.out.gz>_______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > <config_1.4.3.log.gz><make_1.4.3.out.gz>