Ummm....the configure log terminates normally, indicating it configured fine. 
The make log ends, but with no error shown - everything was building just fine.

Did you maybe stop it before it was complete? Run out of disk quota? Or...?


On Aug 24, 2010, at 1:06 PM, Michael E. Thomadakis wrote:

> Hi Ralph, 
> 
> I tried to build 1.4.3.a1r23542 (08/02/2010) with
> 
> ./configure --prefix="/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2" 
> --enable-cxx-exceptions  CFLAGS="-O2" CXXFLAGS="-O2"  FFLAGS="-O2" 
> FCFLAGS="-O2"
> with the GCC 4.1.2
> 
> miket@login002[pts/26]openmpi-1.4.3a1r23542 $ gcc -v
> Using built-in specs.
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
> --infodir=/usr/share/info --enable-shared --enable-threads=posix 
> --enable-checking=release --with-system-zlib --enable-__cxa_atexit 
> --disable-libunwind-exceptions --enable-libgcj-multifile 
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk 
> --disable-dssi --enable-plugin 
> --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic 
> --host=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
> 
> 
> but it failed. I am attaching the configure and make logs.
> 
> regards
> 
> Michael
> 
> 
> On 08/23/10 20:53, Ralph Castain wrote:
>> 
>> Nope - none of them will work with 1.4.2. Sorry - bug not discovered until 
>> after release
>> 
>> On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote:
>> 
>>> Hi Jeff, 
>>> thanks for the quick reply. 
>>> 
>>> Would using '--cpus-per-proc N' in place of '-npernode N' or just '-bynode' 
>>> do the trick?
>>> 
>>> It seems that using '--loadbalance' also crashes mpirun.
>>> 
>>> best ...
>>> 
>>> Michael
>>> 
>>> 
>>> On 08/23/10 19:30, Jeff Squyres wrote:
>>>> 
>>>> Yes, the -npernode segv is a known issue.
>>>> 
>>>> We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl 
>>>> and see if that fixes your problem?
>>>> 
>>>>     http://www.open-mpi.org/nightly/v1.4/
>>>> 
>>>> 
>>>> 
>>>> On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote:
>>>> 
>>>>> Hello OMPI:
>>>>> 
>>>>> We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. 
>>>>> OMPI was built uisng Intel compilers 11.1.072. I am attaching the 
>>>>> configuration log and output from ompi_info -a.
>>>>> 
>>>>> The problem we are encountering is that whenever we use option '-npernode 
>>>>> N' in the mpirun command line we get a segmentation fault as in below:
>>>>> 
>>>>> 
>>>>> miket@login002[pts/7]PS $ mpirun -npernode 1  --display-devel-map  
>>>>> --tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' 
>>>>> hostname
>>>>> 
>>>>>  Map generated by mapping policy: 0402
>>>>>         Npernode: 1     Oversubscribe allowed: TRUE     CPU Lists: FALSE
>>>>>         Num new daemons: 2      New daemon starting vpid 1
>>>>>         Num nodes: 3
>>>>> 
>>>>>  Data for node: Name: login001          Launch id: -1   Arch: 0 State: 2
>>>>>         Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
>>>>>         Daemon: [[44812,0],1]   Daemon launched: False
>>>>>         Num slots: 1    Slots in use: 2
>>>>>         Num slots allocated: 1  Max slots: 0
>>>>>         Username on node: NULL
>>>>>         Num procs: 1    Next node_rank: 1
>>>>>         Data for proc: [[44812,1],0]
>>>>>                 Pid: 0  Local rank: 0   Node rank: 0
>>>>>                 State: 0        App_context: 0  Slot list: NULL
>>>>> 
>>>>>  Data for node: Name: login002          Launch id: -1   Arch: ffc91200  
>>>>> State: 2
>>>>>         Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
>>>>>         Daemon: [[44812,0],0]   Daemon launched: True
>>>>>         Num slots: 1    Slots in use: 2
>>>>>         Num slots allocated: 1  Max slots: 0
>>>>>         Username on node: NULL
>>>>>         Num procs: 1    Next node_rank: 1
>>>>>         Data for proc: [[44812,1],0]
>>>>>                 Pid: 0  Local rank: 0   Node rank: 0
>>>>>                 State: 0        App_context: 0  Slot list: NULL
>>>>> 
>>>>>  Data for node: Name: login003          Launch id: -1   Arch: 0 State: 2
>>>>>         Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
>>>>>         Daemon: [[44812,0],2]   Daemon launched: False
>>>>>         Num slots: 1    Slots in use: 2
>>>>>         Num slots allocated: 1  Max slots: 0
>>>>>         Username on node: NULL
>>>>>         Num procs: 1    Next node_rank: 1
>>>>>         Data for proc: [[44812,1],0]
>>>>>                 Pid: 0  Local rank: 0   Node rank: 0
>>>>>                 State: 0        App_context: 0  Slot list: NULL
>>>>> [login002:02079] *** Process received signal ***
>>>>> [login002:02079] Signal: Segmentation fault (11)
>>>>> [login002:02079] Signal code: Address not mapped (1)
>>>>> [login002:02079] Failing at address: 0x50
>>>>> [login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0]
>>>>> [login002:02079] [ 1] 
>>>>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7)
>>>>>  [0x2afa70d25de7]
>>>>> [login002:02079] [ 2] 
>>>>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8)
>>>>>  [0x2afa70d36088]
>>>>> [login002:02079] [ 3] 
>>>>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7)
>>>>>  [0x2afa70d37fc7]
>>>>> [login002:02079] [ 4] 
>>>>> /g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so 
>>>>> [0x2afa721085a1]
>>>>> [login002:02079] [ 5] mpirun [0x404c27]
>>>>> [login002:02079] [ 6] mpirun [0x403e38]
>>>>> [login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) 
>>>>> [0x3568e1d994]
>>>>> [login002:02079] [ 8] mpirun [0x403d69]
>>>>> [login002:02079] *** End of error message ***
>>>>> Segmentation fault
>>>>> 
>>>>> We tried version 1.4.1 and this problem did not emerge. 
>>>>> 
>>>>> This option is necessary for when our users launch hybrid MPI-OMP code 
>>>>> were they can request M nodes and n ppn in a PBS/Torque setup so they can 
>>>>> only get the right amount of MPI taks. Unfortunately, as soon as we use 
>>>>> the 'npernode N' option mprun crashes. 
>>>>> 
>>>>> Is this a known issue? I found related problem (of around May, 2010)  
>>>>> when people were using the same option but in a SLURM environment. 
>>>>> 
>>>>> regards
>>>>> 
>>>>> Michael
>>>>> 
>>>>> <config.log.gz><ompi_info-a.out.gz>_______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> <config_1.4.3.log.gz><make_1.4.3.out.gz>

Reply via email to