I'm a little confused by your configure line: ./configure --prefix=/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2 --enable-cxx-exceptions CFLAGS=-O2 CXXFLAGS=-O2 FFLAGS=-O2 FCFLAGS=-O2
What's the lone "2" in the middle (after the prefix)? With that extra "2", I'm not able to get configure to complete successfully (because it interprets that "2" as a platform name that does not exist). If I remove that "2", configure completes properly and the build completes properly. I'm afraid I no longer have any RH hosts to test on. Can you do the following: cd top_of_build_dir cd ompi/debuggers rm ompi_debuggers.lo make Then copy-n-paste the gcc command used to compile the ompi_debuggers.o file, remove "-o .libs/libdebuggers_la-ompi_debuggers.o", and add "-E", and redirect the output to a file. Then send me that file -- it should give more of a clue as to exactly what the problem is that you're seeing. On Aug 24, 2010, at 3:25 PM, Michael E. Thomadakis wrote: > > On 08/24/10 14:22, Michael E. Thomadakis wrote: >> Hi, >> >> I used a 'tee' command to capture the output but I forgot to also redirect >> stderr to the file. >> >> This is what a fresh make gave (gcc 4.1.2 again) : >> >> ------------------------------------------------------------------ >> ompi_debuggers.c:81: error: missing terminating " character >> ompi_debuggers.c:81: error: expected expression before \u2018;\u2019 token >> ompi_debuggers.c: In function \u2018ompi_wait_for_debugger\u2019: >> ompi_debuggers.c:212: error: \u2018mpidbg_dll_locations\u2019 undeclared >> (first use in this function) >> ompi_debuggers.c:212: error: (Each undeclared identifier is reported only >> once >> ompi_debuggers.c:212: error: for each function it appears in.) >> ompi_debuggers.c:212: warning: passing argument 3 of \u2018check\u2019 from >> incompatible pointer type >> make[2]: *** [libdebuggers_la-ompi_debuggers.lo] Error 1 >> make[1]: *** [all-recursive] Error 1 >> make: *** [all-recursive] Error 1 >> >> ------------------------------------------------------------------ >> >> Is this critical to run OMPI code? >> >> Thanks for the quick reply Ralph, >> >> Michael >> >> On Tue, 24 Aug 2010, Ralph Castain wrote: >> >> | Date: Tue, 24 Aug 2010 13:16:10 -0600 >> | From: Ralph Castain<r...@open-mpi.org> >> | To: Michael E.Thomadakis<miket7...@gmail.com> >> | Cc: Open MPI Users<us...@open-mpi.org>, mi...@sc.tamu.edu >> | Subject: Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when >> | "-npernode N" is used at command line >> | >> | Ummm....the configure log terminates normally, indicating it configured >> fine. The make log ends, but with no error shown - everything was building >> just fine. >> | >> | Did you maybe stop it before it was complete? Run out of disk quota? Or...? >> | >> | >> | On Aug 24, 2010, at 1:06 PM, Michael E. Thomadakis wrote: >> | >> |> Hi Ralph, >> |> >> |> I tried to build 1.4.3.a1r23542 (08/02/2010) with >> |> >> |> ./configure --prefix="/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2" >> --enable-cxx-exceptions CFLAGS="-O2" CXXFLAGS="-O2" FFLAGS="-O2" >> FCFLAGS="-O2" >> |> with the GCC 4.1.2 >> |> >> |> miket@login002[pts/26]openmpi-1.4.3a1r23542 $ gcc -v >> |> Using built-in specs. >> |> Target: x86_64-redhat-linux >> |> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man >> --infodir=/usr/share/info --enable-shared --enable-threads=posix >> --enable-checking=release --with-system-zlib --enable-__cxa_atexit >> --disable-libunwind-exceptions --enable-libgcj-multifile >> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk >> --disable-dssi --enable-plugin >> --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic >> --host=x86_64-redhat-linux >> |> Thread model: posix >> |> gcc version 4.1.2 20080704 (Red Hat 4.1.2-46) >> |> >> |> >> |> but it failed. I am attaching the configure and make logs. >> |> >> |> regards >> |> >> |> Michael >> |> >> |> >> |> On 08/23/10 20:53, Ralph Castain wrote: >> |>> >> |>> Nope - none of them will work with 1.4.2. Sorry - bug not discovered >> until after release >> |>> >> |>> On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote: >> |>> >> |>>> Hi Jeff, >> |>>> thanks for the quick reply. >> |>>> >> |>>> Would using '--cpus-per-proc N' in place of '-npernode N' or just >> '-bynode' do the trick? >> |>>> >> |>>> It seems that using '--loadbalance' also crashes mpirun. >> |>>> >> |>>> best ... >> |>>> >> |>>> Michael >> |>>> >> |>>> >> |>>> On 08/23/10 19:30, Jeff Squyres wrote: >> |>>>> >> |>>>> Yes, the -npernode segv is a known issue. >> |>>>> >> |>>>> We have it fixed in the 1.4.x nightly tarballs; can you give it a >> whirl and see if that fixes your problem? >> |>>>> >> |>>>> http://www.open-mpi.org/nightly/v1.4/ >> |>>>> >> |>>>> >> |>>>> >> |>>>> On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote: >> |>>>> >> |>>>>> Hello OMPI: >> |>>>>> >> |>>>>> We have installed OMPI V1.4.2 on a Nehalem cluster running >> CentOS5.4. OMPI was built uisng Intel compilers 11.1.072. I am attaching the >> configuration log and output from ompi_info -a. >> |>>>>> >> |>>>>> The problem we are encountering is that whenever we use option >> '-npernode N' in the mpirun command line we get a segmentation fault as in >> below: >> |>>>>> >> |>>>>> >> |>>>>> miket@login002[pts/7]PS $ mpirun -npernode 1 --display-devel-map >> --tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname >> |>>>>> >> |>>>>> Map generated by mapping policy: 0402 >> |>>>>> Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: >> FALSE >> |>>>>> Num new daemons: 2 New daemon starting vpid 1 >> |>>>>> Num nodes: 3 >> |>>>>> >> |>>>>> Data for node: Name: login001 Launch id: -1 Arch: 0 >> State: 2 >> |>>>>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >> |>>>>> Daemon: [[44812,0],1] Daemon launched: False >> |>>>>> Num slots: 1 Slots in use: 2 >> |>>>>> Num slots allocated: 1 Max slots: 0 >> |>>>>> Username on node: NULL >> |>>>>> Num procs: 1 Next node_rank: 1 >> |>>>>> Data for proc: [[44812,1],0] >> |>>>>> Pid: 0 Local rank: 0 Node rank: 0 >> |>>>>> State: 0 App_context: 0 Slot list: NULL >> |>>>>> >> |>>>>> Data for node: Name: login002 Launch id: -1 Arch: >> ffc91200 State: 2 >> |>>>>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >> |>>>>> Daemon: [[44812,0],0] Daemon launched: True >> |>>>>> Num slots: 1 Slots in use: 2 >> |>>>>> Num slots allocated: 1 Max slots: 0 >> |>>>>> Username on node: NULL >> |>>>>> Num procs: 1 Next node_rank: 1 >> |>>>>> Data for proc: [[44812,1],0] >> |>>>>> Pid: 0 Local rank: 0 Node rank: 0 >> |>>>>> State: 0 App_context: 0 Slot list: NULL >> |>>>>> >> |>>>>> Data for node: Name: login003 Launch id: -1 Arch: 0 >> State: 2 >> |>>>>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >> |>>>>> Daemon: [[44812,0],2] Daemon launched: False >> |>>>>> Num slots: 1 Slots in use: 2 >> |>>>>> Num slots allocated: 1 Max slots: 0 >> |>>>>> Username on node: NULL >> |>>>>> Num procs: 1 Next node_rank: 1 >> |>>>>> Data for proc: [[44812,1],0] >> |>>>>> Pid: 0 Local rank: 0 Node rank: 0 >> |>>>>> State: 0 App_context: 0 Slot list: NULL >> |>>>>> [login002:02079] *** Process received signal *** >> |>>>>> [login002:02079] Signal: Segmentation fault (11) >> |>>>>> [login002:02079] Signal code: Address not mapped (1) >> |>>>>> [login002:02079] Failing at address: 0x50 >> |>>>>> [login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0] >> |>>>>> [login002:02079] [ 1] >> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7) >> [0x2afa70d25de7] >> |>>>>> [login002:02079] [ 2] >> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8) >> [0x2afa70d36088] >> |>>>>> [login002:02079] [ 3] >> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7) >> [0x2afa70d37fc7] >> |>>>>> [login002:02079] [ 4] >> /g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1] >> |>>>>> [login002:02079] [ 5] mpirun [0x404c27] >> |>>>>> [login002:02079] [ 6] mpirun [0x403e38] >> |>>>>> [login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) >> [0x3568e1d994] >> |>>>>> [login002:02079] [ 8] mpirun [0x403d69] >> |>>>>> [login002:02079] *** End of error message *** >> |>>>>> Segmentation fault >> |>>>>> >> |>>>>> We tried version 1.4.1 and this problem did not emerge. >> |>>>>> >> |>>>>> This option is necessary for when our users launch hybrid MPI-OMP >> code were they can request M nodes and n ppn in a PBS/Torque setup so they >> can only get the right amount of MPI taks. Unfortunately, as soon as we use >> the 'npernode N' option mprun crashes. >> |>>>>> >> |>>>>> Is this a known issue? I found related problem (of around May, 2010) >> when people were using the same option but in a SLURM environment. >> |>>>>> >> |>>>>> regards >> |>>>>> >> |>>>>> Michael >> |>>>>> >> |>>>>> >> <config.log.gz><ompi_info-a.out.gz>_______________________________________________ >> |>>>>> users mailing list >> |>>>>> us...@open-mpi.org >> |>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> |>>> >> |>>> _______________________________________________ >> |>>> users mailing list >> |>>> us...@open-mpi.org >> |>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> |>> >> |>> >> |>> _______________________________________________ >> |>> users mailing list >> |>> us...@open-mpi.org >> |>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> |> >> |> <config_1.4.3.log.gz><make_1.4.3.out.gz> >> | >> | >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/