I'm a little confused by your configure line:

./configure --prefix=/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2 
--enable-cxx-exceptions CFLAGS=-O2 CXXFLAGS=-O2 FFLAGS=-O2 FCFLAGS=-O2

What's the lone "2" in the middle (after the prefix)?

With that extra "2", I'm not able to get configure to complete successfully 
(because it interprets that "2" as a platform name that does not exist).  If I 
remove that "2", configure completes properly and the build completes properly.

I'm afraid I no longer have any RH hosts to test on.  Can you do the following:

cd top_of_build_dir
cd ompi/debuggers
rm ompi_debuggers.lo
make

Then copy-n-paste the gcc command used to compile the ompi_debuggers.o file, 
remove "-o .libs/libdebuggers_la-ompi_debuggers.o", and add "-E", and redirect 
the output to a file.  Then send me that file -- it should give more of a clue 
as to exactly what the problem is that you're seeing.




On Aug 24, 2010, at 3:25 PM, Michael E. Thomadakis wrote:

> 
> On 08/24/10 14:22, Michael E. Thomadakis wrote:
>> Hi,
>> 
>> I used a 'tee' command to capture the output but I forgot to also redirect
>> stderr to the file.
>> 
>> This is what a fresh make gave (gcc 4.1.2 again) :
>> 
>> ------------------------------------------------------------------
>> ompi_debuggers.c:81: error: missing terminating " character
>> ompi_debuggers.c:81: error: expected expression before \u2018;\u2019 token
>> ompi_debuggers.c: In function \u2018ompi_wait_for_debugger\u2019:
>> ompi_debuggers.c:212: error: \u2018mpidbg_dll_locations\u2019 undeclared
>> (first use in this function)
>> ompi_debuggers.c:212: error: (Each undeclared identifier is reported only 
>> once
>> ompi_debuggers.c:212: error: for each function it appears in.)
>> ompi_debuggers.c:212: warning: passing argument 3 of \u2018check\u2019 from
>> incompatible pointer type
>> make[2]: *** [libdebuggers_la-ompi_debuggers.lo] Error 1
>> make[1]: *** [all-recursive] Error 1
>> make: *** [all-recursive] Error 1
>> 
>> ------------------------------------------------------------------
>> 
>> Is this critical to run OMPI code?
>> 
>> Thanks for the quick reply Ralph,
>> 
>> Michael
>> 
>> On Tue, 24 Aug 2010, Ralph Castain wrote:
>> 
>> | Date: Tue, 24 Aug 2010 13:16:10 -0600
>> | From: Ralph Castain<r...@open-mpi.org>
>> | To: Michael E.Thomadakis<miket7...@gmail.com>
>> | Cc: Open MPI Users<us...@open-mpi.org>, mi...@sc.tamu.edu
>> | Subject: Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when
>> |     "-npernode N" is used at command line
>> |
>> | Ummm....the configure log terminates normally, indicating it configured 
>> fine. The make log ends, but with no error shown - everything was building 
>> just fine.
>> |
>> | Did you maybe stop it before it was complete? Run out of disk quota? Or...?
>> |
>> |
>> | On Aug 24, 2010, at 1:06 PM, Michael E. Thomadakis wrote:
>> |
>> |>  Hi Ralph,
>> |>
>> |>  I tried to build 1.4.3.a1r23542 (08/02/2010) with
>> |>
>> |>  ./configure --prefix="/g/software/openmpi-1.4.3a1r23542/gcc-4.1.2 2" 
>> --enable-cxx-exceptions  CFLAGS="-O2" CXXFLAGS="-O2"  FFLAGS="-O2" 
>> FCFLAGS="-O2"
>> |>  with the GCC 4.1.2
>> |>
>> |>  miket@login002[pts/26]openmpi-1.4.3a1r23542 $ gcc -v
>> |>  Using built-in specs.
>> |>  Target: x86_64-redhat-linux
>> |>  Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
>> --infodir=/usr/share/info --enable-shared --enable-threads=posix 
>> --enable-checking=release --with-system-zlib --enable-__cxa_atexit 
>> --disable-libunwind-exceptions --enable-libgcj-multifile 
>> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk 
>> --disable-dssi --enable-plugin 
>> --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic 
>> --host=x86_64-redhat-linux
>> |>  Thread model: posix
>> |>  gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
>> |>
>> |>
>> |>  but it failed. I am attaching the configure and make logs.
>> |>
>> |>  regards
>> |>
>> |>  Michael
>> |>
>> |>
>> |>  On 08/23/10 20:53, Ralph Castain wrote:
>> |>>
>> |>>  Nope - none of them will work with 1.4.2. Sorry - bug not discovered 
>> until after release
>> |>>
>> |>>  On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote:
>> |>>
>> |>>>  Hi Jeff,
>> |>>>  thanks for the quick reply.
>> |>>>
>> |>>>  Would using '--cpus-per-proc N' in place of '-npernode N' or just 
>> '-bynode' do the trick?
>> |>>>
>> |>>>  It seems that using '--loadbalance' also crashes mpirun.
>> |>>>
>> |>>>  best ...
>> |>>>
>> |>>>  Michael
>> |>>>
>> |>>>
>> |>>>  On 08/23/10 19:30, Jeff Squyres wrote:
>> |>>>>
>> |>>>>  Yes, the -npernode segv is a known issue.
>> |>>>>
>> |>>>>  We have it fixed in the 1.4.x nightly tarballs; can you give it a 
>> whirl and see if that fixes your problem?
>> |>>>>
>> |>>>>      http://www.open-mpi.org/nightly/v1.4/
>> |>>>>
>> |>>>>
>> |>>>>
>> |>>>>  On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote:
>> |>>>>
>> |>>>>>  Hello OMPI:
>> |>>>>>
>> |>>>>>  We have installed OMPI V1.4.2 on a Nehalem cluster running 
>> CentOS5.4. OMPI was built uisng Intel compilers 11.1.072. I am attaching the 
>> configuration log and output from ompi_info -a.
>> |>>>>>
>> |>>>>>  The problem we are encountering is that whenever we use option 
>> '-npernode N' in the mpirun command line we get a segmentation fault as in 
>> below:
>> |>>>>>
>> |>>>>>
>> |>>>>>  miket@login002[pts/7]PS $ mpirun -npernode 1  --display-devel-map  
>> --tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname
>> |>>>>>
>> |>>>>>   Map generated by mapping policy: 0402
>> |>>>>>          Npernode: 1     Oversubscribe allowed: TRUE     CPU Lists: 
>> FALSE
>> |>>>>>          Num new daemons: 2      New daemon starting vpid 1
>> |>>>>>          Num nodes: 3
>> |>>>>>
>> |>>>>>   Data for node: Name: login001          Launch id: -1   Arch: 0 
>> State: 2
>> |>>>>>          Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
>> |>>>>>          Daemon: [[44812,0],1]   Daemon launched: False
>> |>>>>>          Num slots: 1    Slots in use: 2
>> |>>>>>          Num slots allocated: 1  Max slots: 0
>> |>>>>>          Username on node: NULL
>> |>>>>>          Num procs: 1    Next node_rank: 1
>> |>>>>>          Data for proc: [[44812,1],0]
>> |>>>>>                  Pid: 0  Local rank: 0   Node rank: 0
>> |>>>>>                  State: 0        App_context: 0  Slot list: NULL
>> |>>>>>
>> |>>>>>   Data for node: Name: login002          Launch id: -1   Arch: 
>> ffc91200  State: 2
>> |>>>>>          Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
>> |>>>>>          Daemon: [[44812,0],0]   Daemon launched: True
>> |>>>>>          Num slots: 1    Slots in use: 2
>> |>>>>>          Num slots allocated: 1  Max slots: 0
>> |>>>>>          Username on node: NULL
>> |>>>>>          Num procs: 1    Next node_rank: 1
>> |>>>>>          Data for proc: [[44812,1],0]
>> |>>>>>                  Pid: 0  Local rank: 0   Node rank: 0
>> |>>>>>                  State: 0        App_context: 0  Slot list: NULL
>> |>>>>>
>> |>>>>>   Data for node: Name: login003          Launch id: -1   Arch: 0 
>> State: 2
>> |>>>>>          Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
>> |>>>>>          Daemon: [[44812,0],2]   Daemon launched: False
>> |>>>>>          Num slots: 1    Slots in use: 2
>> |>>>>>          Num slots allocated: 1  Max slots: 0
>> |>>>>>          Username on node: NULL
>> |>>>>>          Num procs: 1    Next node_rank: 1
>> |>>>>>          Data for proc: [[44812,1],0]
>> |>>>>>                  Pid: 0  Local rank: 0   Node rank: 0
>> |>>>>>                  State: 0        App_context: 0  Slot list: NULL
>> |>>>>>  [login002:02079] *** Process received signal ***
>> |>>>>>  [login002:02079] Signal: Segmentation fault (11)
>> |>>>>>  [login002:02079] Signal code: Address not mapped (1)
>> |>>>>>  [login002:02079] Failing at address: 0x50
>> |>>>>>  [login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0]
>> |>>>>>  [login002:02079] [ 1] 
>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7)
>>  [0x2afa70d25de7]
>> |>>>>>  [login002:02079] [ 2] 
>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8)
>>  [0x2afa70d36088]
>> |>>>>>  [login002:02079] [ 3] 
>> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7)
>>  [0x2afa70d37fc7]
>> |>>>>>  [login002:02079] [ 4] 
>> /g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1]
>> |>>>>>  [login002:02079] [ 5] mpirun [0x404c27]
>> |>>>>>  [login002:02079] [ 6] mpirun [0x403e38]
>> |>>>>>  [login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) 
>> [0x3568e1d994]
>> |>>>>>  [login002:02079] [ 8] mpirun [0x403d69]
>> |>>>>>  [login002:02079] *** End of error message ***
>> |>>>>>  Segmentation fault
>> |>>>>>
>> |>>>>>  We tried version 1.4.1 and this problem did not emerge.
>> |>>>>>
>> |>>>>>  This option is necessary for when our users launch hybrid MPI-OMP 
>> code were they can request M nodes and n ppn in a PBS/Torque setup so they 
>> can only get the right amount of MPI taks. Unfortunately, as soon as we use 
>> the 'npernode N' option mprun crashes.
>> |>>>>>
>> |>>>>>  Is this a known issue? I found related problem (of around May, 2010) 
>>  when people were using the same option but in a SLURM environment.
>> |>>>>>
>> |>>>>>  regards
>> |>>>>>
>> |>>>>>  Michael
>> |>>>>>
>> |>>>>>  
>> <config.log.gz><ompi_info-a.out.gz>_______________________________________________
>> |>>>>>  users mailing list
>> |>>>>>  us...@open-mpi.org
>> |>>>>>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>> |>>>
>> |>>>  _______________________________________________
>> |>>>  users mailing list
>> |>>>  us...@open-mpi.org
>> |>>>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>> |>>
>> |>>
>> |>>  _______________________________________________
>> |>>  users mailing list
>> |>>  us...@open-mpi.org
>> |>>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>> |>
>> |>  <config_1.4.3.log.gz><make_1.4.3.out.gz>
>> |
>> |
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to