On 08/24/10 14:22, Michael E. Thomadakis wrote:
Hi,
I used a 'tee' command to capture the output but I forgot to also redirect
stderr to the file.
This is what a fresh make gave (gcc 4.1.2 again) :
------------------------------------------------------------------
ompi_debuggers.c:81: error: missing terminating " character
ompi_debuggers.c:81: error: expected expression before \u2018;\u2019 token
ompi_debuggers.c: In function \u2018ompi_wait_for_debugger\u2019:
ompi_debuggers.c:212: error: \u2018mpidbg_dll_locations\u2019 undeclared
(first use in this function)
ompi_debuggers.c:212: error: (Each undeclared identifier is reported only once
ompi_debuggers.c:212: error: for each function it appears in.)
ompi_debuggers.c:212: warning: passing argument 3 of \u2018check\u2019 from
incompatible pointer type
make[2]: *** [libdebuggers_la-ompi_debuggers.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1
------------------------------------------------------------------
Is this critical to run OMPI code?
Thanks for the quick reply Ralph,
Michael
On Tue, 24 Aug 2010, Ralph Castain wrote:
| Date: Tue, 24 Aug 2010 13:16:10 -0600
| From: Ralph Castain<r...@open-mpi.org>
| To: Michael E.Thomadakis<miket7...@gmail.com>
| Cc: Open MPI Users<us...@open-mpi.org>, mi...@sc.tamu.edu
| Subject: Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when
| "-npernode N" is used at command line
|
| Ummm....the configure log terminates normally, indicating it configured fine.
The make log ends, but with no error shown - everything was building just fine.
|
| Did you maybe stop it before it was complete? Run out of disk quota? Or...?
|
|
| On Aug 24, 2010, at 1:06 PM, Michael E. Thomadakis wrote:
|
|> Hi Ralph,
|>
|> I tried to build 1.4.3.a1r23542 (08/02/2010) with
|>
|> ./configure --prefix="/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2" --enable-cxx-exceptions
CFLAGS="-O2" CXXFLAGS="-O2" FFLAGS="-O2" FCFLAGS="-O2"
|> with the GCC 4.1.2
|>
|> miket@login002[pts/26]openmpi-1.4.3a1r23542 $ gcc -v
|> Using built-in specs.
|> Target: x86_64-redhat-linux
|> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-libgcj-multifile
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic
--host=x86_64-redhat-linux
|> Thread model: posix
|> gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
|>
|>
|> but it failed. I am attaching the configure and make logs.
|>
|> regards
|>
|> Michael
|>
|>
|> On 08/23/10 20:53, Ralph Castain wrote:
|>>
|>> Nope - none of them will work with 1.4.2. Sorry - bug not discovered until
after release
|>>
|>> On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote:
|>>
|>>> Hi Jeff,
|>>> thanks for the quick reply.
|>>>
|>>> Would using '--cpus-per-proc N' in place of '-npernode N' or just
'-bynode' do the trick?
|>>>
|>>> It seems that using '--loadbalance' also crashes mpirun.
|>>>
|>>> best ...
|>>>
|>>> Michael
|>>>
|>>>
|>>> On 08/23/10 19:30, Jeff Squyres wrote:
|>>>>
|>>>> Yes, the -npernode segv is a known issue.
|>>>>
|>>>> We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl
and see if that fixes your problem?
|>>>>
|>>>> http://www.open-mpi.org/nightly/v1.4/
|>>>>
|>>>>
|>>>>
|>>>> On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote:
|>>>>
|>>>>> Hello OMPI:
|>>>>>
|>>>>> We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4.
OMPI was built uisng Intel compilers 11.1.072. I am attaching the configuration log and output
from ompi_info -a.
|>>>>>
|>>>>> The problem we are encountering is that whenever we use option
'-npernode N' in the mpirun command line we get a segmentation fault as in below:
|>>>>>
|>>>>>
|>>>>> miket@login002[pts/7]PS $ mpirun -npernode 1 --display-devel-map
--tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname
|>>>>>
|>>>>> Map generated by mapping policy: 0402
|>>>>> Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE
|>>>>> Num new daemons: 2 New daemon starting vpid 1
|>>>>> Num nodes: 3
|>>>>>
|>>>>> Data for node: Name: login001 Launch id: -1 Arch: 0 State: 2
|>>>>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
|>>>>> Daemon: [[44812,0],1] Daemon launched: False
|>>>>> Num slots: 1 Slots in use: 2
|>>>>> Num slots allocated: 1 Max slots: 0
|>>>>> Username on node: NULL
|>>>>> Num procs: 1 Next node_rank: 1
|>>>>> Data for proc: [[44812,1],0]
|>>>>> Pid: 0 Local rank: 0 Node rank: 0
|>>>>> State: 0 App_context: 0 Slot list: NULL
|>>>>>
|>>>>> Data for node: Name: login002 Launch id: -1 Arch: ffc91200
State: 2
|>>>>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
|>>>>> Daemon: [[44812,0],0] Daemon launched: True
|>>>>> Num slots: 1 Slots in use: 2
|>>>>> Num slots allocated: 1 Max slots: 0
|>>>>> Username on node: NULL
|>>>>> Num procs: 1 Next node_rank: 1
|>>>>> Data for proc: [[44812,1],0]
|>>>>> Pid: 0 Local rank: 0 Node rank: 0
|>>>>> State: 0 App_context: 0 Slot list: NULL
|>>>>>
|>>>>> Data for node: Name: login003 Launch id: -1 Arch: 0 State: 2
|>>>>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
|>>>>> Daemon: [[44812,0],2] Daemon launched: False
|>>>>> Num slots: 1 Slots in use: 2
|>>>>> Num slots allocated: 1 Max slots: 0
|>>>>> Username on node: NULL
|>>>>> Num procs: 1 Next node_rank: 1
|>>>>> Data for proc: [[44812,1],0]
|>>>>> Pid: 0 Local rank: 0 Node rank: 0
|>>>>> State: 0 App_context: 0 Slot list: NULL
|>>>>> [login002:02079] *** Process received signal ***
|>>>>> [login002:02079] Signal: Segmentation fault (11)
|>>>>> [login002:02079] Signal code: Address not mapped (1)
|>>>>> [login002:02079] Failing at address: 0x50
|>>>>> [login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0]
|>>>>> [login002:02079] [ 1]
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7)
[0x2afa70d25de7]
|>>>>> [login002:02079] [ 2]
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8)
[0x2afa70d36088]
|>>>>> [login002:02079] [ 3]
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7)
[0x2afa70d37fc7]
|>>>>> [login002:02079] [ 4]
/g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1]
|>>>>> [login002:02079] [ 5] mpirun [0x404c27]
|>>>>> [login002:02079] [ 6] mpirun [0x403e38]
|>>>>> [login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3568e1d994]
|>>>>> [login002:02079] [ 8] mpirun [0x403d69]
|>>>>> [login002:02079] *** End of error message ***
|>>>>> Segmentation fault
|>>>>>
|>>>>> We tried version 1.4.1 and this problem did not emerge.
|>>>>>
|>>>>> This option is necessary for when our users launch hybrid MPI-OMP code
were they can request M nodes and n ppn in a PBS/Torque setup so they can only get the right
amount of MPI taks. Unfortunately, as soon as we use the 'npernode N' option mprun crashes.
|>>>>>
|>>>>> Is this a known issue? I found related problem (of around May, 2010)
when people were using the same option but in a SLURM environment.
|>>>>>
|>>>>> regards
|>>>>>
|>>>>> Michael
|>>>>>
|>>>>>
<config.log.gz><ompi_info-a.out.gz>_______________________________________________
|>>>>> users mailing list
|>>>>> us...@open-mpi.org
|>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
|>>>
|>>> _______________________________________________
|>>> users mailing list
|>>> us...@open-mpi.org
|>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
|>>
|>>
|>> _______________________________________________
|>> users mailing list
|>> us...@open-mpi.org
|>> http://www.open-mpi.org/mailman/listinfo.cgi/users
|>
|> <config_1.4.3.log.gz><make_1.4.3.out.gz>
|
|