[OMPI users] bug report: wrong reference in mpi.h to mpicxx.h
Hi all, I'm not sure whether this bug has already been reported/fixed (maybe in the v1.1.1 pre-release): I've compiled and installed Open MPI Version 1.1 (stable), which worked well. for configuring OpenMPI I used the commandline ./configure --prefix=/home/ph/local/openmpi --disable-mpi-f77 --disable-mpi-f99 since i don't need fortran support. Compiling and executing a simple MPI test program (in C) with Open MPI also worked well. After that I tried to compile VTK (http://www.vtk.org) with MPI support using OpenMPI. The compilation process issued the following error message: /home/ph/local/openmpi/include/mpi.h:1757:33: ompi/mpi/cxx/mpicxx.h: No such file or directory and indeed the location of the file mpicxx.h is /home/ph/local/openmpi/include/openmpi/ompi/mpi/cxx/mpicxx.h and in mpi.h it is stated #if !defined(OMPI_SKIP_MPICXX) && OMPI_WANT_CXX_BINDINGS && !OMPI_BUILDING #if defined(__cplusplus) || defined(c_plusplus) #include "ompi/mpi/cxx/mpicxx.h" #endif #endif so this would refer to the file /home/ph/local/openmpi/include/ompi/mpi/cxx/mpicxx.h as I see it. so there is one subdirectory missing (openmpi) in the reference within mpi.h. Regards, Paul Heinzlreiter
Re: [OMPI users] bug report: wrong reference in mpi.h to mpicxx.h
Dear Paul, this previously posted "tutorial" how to build ParaView could maybe useful to you: http://www.open-mpi.org/community/lists/users/2006/05/1246.php regards, Sven On Wednesday 19 July 2006 14:57, Paul Heinzlreiter wrote: > Hi all, > > I'm not sure whether this bug has already been reported/fixed (maybe in > the v1.1.1 pre-release): > > I've compiled and installed Open MPI Version 1.1 (stable), which worked > well. > > for configuring OpenMPI I used the commandline > > ./configure --prefix=/home/ph/local/openmpi --disable-mpi-f77 > --disable-mpi-f99 > > since i don't need fortran support. > > Compiling and executing a simple MPI test program (in C) with Open MPI > also worked well. > > After that I tried to compile VTK (http://www.vtk.org) with MPI support > using OpenMPI. > > The compilation process issued the following error message: > > /home/ph/local/openmpi/include/mpi.h:1757:33: ompi/mpi/cxx/mpicxx.h: No > such file or directory > > and indeed the location of the file mpicxx.h is > /home/ph/local/openmpi/include/openmpi/ompi/mpi/cxx/mpicxx.h > > and in mpi.h > > it is stated > > #if !defined(OMPI_SKIP_MPICXX) && OMPI_WANT_CXX_BINDINGS && !OMPI_BUILDING > #if defined(__cplusplus) || defined(c_plusplus) > #include "ompi/mpi/cxx/mpicxx.h" > #endif > #endif > > so this would refer to the file > > /home/ph/local/openmpi/include/ompi/mpi/cxx/mpicxx.h > > as I see it. > > so there is one subdirectory missing (openmpi) in the reference within > mpi.h. > > Regards, > Paul Heinzlreiter > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] bug report: wrong reference in mpi.h to mpicxx.h
On Wed, 2006-07-19 at 14:57 +0200, Paul Heinzlreiter wrote: > After that I tried to compile VTK (http://www.vtk.org) with MPI support > using OpenMPI. > > The compilation process issued the following error message: > > /home/ph/local/openmpi/include/mpi.h:1757:33: ompi/mpi/cxx/mpicxx.h: No > such file or directory Sven sent instructions on how to best build VTK, but I wanted to explain what you are seeing. Open MPI actually requires two -I options to use the C++ bindings: -I/include and -I/include/openmpi. Generally, the wrapper compilers (mpicc, mpiCC, mpif77, etc.) are used to build Open MPI applications and the -I flags are automatically added without any problem (a bunch of other flags that might be required on your system may also be added). You can use the "mpiCC -showme" option to the wrapper compiler to see excatly which flags it might add when compiling / linking / etc. Hope this helps, Brian
Re: [OMPI users] bug report: wrong reference in mpi.h to mpicxx.h
I just copied .../openmpi/include/openmpi/ompi/... to ../openmpi/include/ompi/ ... and all went well, VTK was built together with it's testing tree including MPI applications, using the g++/gcc compiler. maybe it works with the mpi(CC/cc) compiler without moving directories around. you can only specify one compiler for the whole vtk source and most of it is not mpi dependent. Paul Brian Barrett wrote: > On Wed, 2006-07-19 at 14:57 +0200, Paul Heinzlreiter wrote: > >> After that I tried to compile VTK (http://www.vtk.org) with MPI support >> using OpenMPI. >> >> The compilation process issued the following error message: >> >> /home/ph/local/openmpi/include/mpi.h:1757:33: ompi/mpi/cxx/mpicxx.h: No >> such file or directory > > Sven sent instructions on how to best build VTK, but I wanted to explain > what you are seeing. Open MPI actually requires two -I options to use > the C++ bindings: -I/include and -I/include/openmpi. > Generally, the wrapper compilers (mpicc, mpiCC, mpif77, etc.) are used > to build Open MPI applications and the -I flags are automatically added > without any problem (a bunch of other flags that might be required on > your system may also be added). > > You can use the "mpiCC -showme" option to the wrapper compiler to see > excatly which flags it might add when compiling / linking / etc. > > > Hope this helps, > > Brian > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] SEGV in libopal during MPI_Alltoall
Hi, I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler 9.1. I get this error message during an MPI_Alltoall call: Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0x1cd04fe0 [0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75] [1] func:/lib64/libpthread.so.0 [0x2b569739b140] [2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) [0x2b5697278cf0] *** End of error message *** and have no idea about the problem. It arises if I exceed a specific number (10) of MPI nodes. The error occures in this code: do i = 1,npuntos print *,'puntos',i tam = 2**(i-1) tmin = 1e5 tavg = 0.0d0 do j = 1,rep envio = 8.0d0*j call mpi_barrier(mpi_comm_world,ierr) time1 = mpi_wtime() do k = 1,rep2 call mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr) end do call mpi_barrier(mpi_comm_world,ierr) time2 = mpi_wtime() time = (time2 - time1)/(rep2) if (time < tmin) tmin = time tavg = tavg + time end do m_tmin(i) = tmin m_tavg(i) = tavg/rep end do this code is said to be running on another system (running IBGD 1.8.x). I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar error message when using 13 nodes: forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PCRoutineLineSource libpthread.so.02B65DA39B140 Unknown Unknown Unknown main.out 00448BDB Unknown Unknown Unknown [9] Registration failed, file : intra_rdma_alltoall.c, line : 163 [6] Registration failed, file : intra_rdma_alltoall.c, line : 163 9 - MPI_ALLTOALL : Unknown error [9] [] Aborting Program! 6 - MPI_ALLTOALL : Unknown error [6] [] Aborting Program! [2] Registration failed, file : intra_rdma_alltoall.c, line : 163 [11] Registration failed, file : intra_rdma_alltoall.c, line : 163 11 - MPI_ALLTOALL : Unknown error [11] [] Aborting Program! 2 - MPI_ALLTOALL : Unknown error [2] [] Aborting Program! [10] Registration failed, file : intra_rdma_alltoall.c, line : 163 10 - MPI_ALLTOALL : Unknown error [10] [] Aborting Program! [5] Registration failed, file : intra_rdma_alltoall.c, line : 163 5 - MPI_ALLTOALL : Unknown error [5] [] Aborting Program! [3] Registration failed, file : intra_rdma_alltoall.c, line : 163 [8] Registration failed, file : intra_rdma_alltoall.c, line : 163 3 - MPI_ALLTOALL : Unknown error [3] [] Aborting Program! 8 - MPI_ALLTOALL : Unknown error [8] [] Aborting Program! [4] Registration failed, file : intra_rdma_alltoall.c, line : 163 4 - MPI_ALLTOALL : Unknown error [4] [] Aborting Program! [7] Registration failed, file : intra_rdma_alltoall.c, line : 163 7 - MPI_ALLTOALL : Unknown error [7] [] Aborting Program! [0] Registration failed, file : intra_rdma_alltoall.c, line : 163 0 - MPI_ALLTOALL : Unknown error [0] [] Aborting Program! [1] Registration failed, file : intra_rdma_alltoall.c, line : 163 1 - MPI_ALLTOALL : Unknown error [1] [] Aborting Program! I don't know wether this is a problem with MPI or Intel Compiler. Please, can anybody point me in the right direction what I could have done wrong? This is my first post (so be gentle) and at this time I'm not very used to the verbosity of this list, so if you need any further informations do not hesitate do request them. Thanks in advance and kind regards, -- Frank Gruellich HPC-Techniker Tel.: +49 3722 528 42 Fax:+49 3722 528 15 E-Mail: frank.gruell...@megware.com MEGWARE Computer GmbH Vertrieb und Service Nordstrasse 19 09247 Chemnitz/Roehrsdorf Germany http://www.megware.com/
Re: [OMPI users] SEGV in libopal during MPI_Alltoall
Hi Frank I am not sure which alltoall your using in 1.1 so can you please run the ompi_info utility which is normally built and put into the same directory as mpirun? i.e. host% ompi_info This provides lots of really usefull info on everything before we dig deeper into your issue and then more specifically run host% ompi_info --param coll all thanks Graham On Wed, 19 Jul 2006, Frank Gruellich wrote: Hi, I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler 9.1. I get this error message during an MPI_Alltoall call: Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0x1cd04fe0 [0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75] [1] func:/lib64/libpthread.so.0 [0x2b569739b140] [2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) [0x2b5697278cf0] *** End of error message *** and have no idea about the problem. It arises if I exceed a specific number (10) of MPI nodes. The error occures in this code: do i = 1,npuntos print *,'puntos',i tam = 2**(i-1) tmin = 1e5 tavg = 0.0d0 do j = 1,rep envio = 8.0d0*j call mpi_barrier(mpi_comm_world,ierr) time1 = mpi_wtime() do k = 1,rep2 call mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr) end do call mpi_barrier(mpi_comm_world,ierr) time2 = mpi_wtime() time = (time2 - time1)/(rep2) if (time < tmin) tmin = time tavg = tavg + time end do m_tmin(i) = tmin m_tavg(i) = tavg/rep end do this code is said to be running on another system (running IBGD 1.8.x). I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar error message when using 13 nodes: forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PCRoutineLineSource libpthread.so.02B65DA39B140 Unknown Unknown Unknown main.out 00448BDB Unknown Unknown Unknown [9] Registration failed, file : intra_rdma_alltoall.c, line : 163 [6] Registration failed, file : intra_rdma_alltoall.c, line : 163 9 - MPI_ALLTOALL : Unknown error [9] [] Aborting Program! 6 - MPI_ALLTOALL : Unknown error [6] [] Aborting Program! [2] Registration failed, file : intra_rdma_alltoall.c, line : 163 [11] Registration failed, file : intra_rdma_alltoall.c, line : 163 11 - MPI_ALLTOALL : Unknown error [11] [] Aborting Program! 2 - MPI_ALLTOALL : Unknown error [2] [] Aborting Program! [10] Registration failed, file : intra_rdma_alltoall.c, line : 163 10 - MPI_ALLTOALL : Unknown error [10] [] Aborting Program! [5] Registration failed, file : intra_rdma_alltoall.c, line : 163 5 - MPI_ALLTOALL : Unknown error [5] [] Aborting Program! [3] Registration failed, file : intra_rdma_alltoall.c, line : 163 [8] Registration failed, file : intra_rdma_alltoall.c, line : 163 3 - MPI_ALLTOALL : Unknown error [3] [] Aborting Program! 8 - MPI_ALLTOALL : Unknown error [8] [] Aborting Program! [4] Registration failed, file : intra_rdma_alltoall.c, line : 163 4 - MPI_ALLTOALL : Unknown error [4] [] Aborting Program! [7] Registration failed, file : intra_rdma_alltoall.c, line : 163 7 - MPI_ALLTOALL : Unknown error [7] [] Aborting Program! [0] Registration failed, file : intra_rdma_alltoall.c, line : 163 0 - MPI_ALLTOALL : Unknown error [0] [] Aborting Program! [1] Registration failed, file : intra_rdma_alltoall.c, line : 163 1 - MPI_ALLTOALL : Unknown error [1] [] Aborting Program! I don't know wether this is a problem with MPI or Intel Compiler. Please, can anybody point me in the right direction what I could have done wrong? This is my first post (so be gentle) and at this time I'm not very used to the verbosity of this list, so if you need any further informations do not hesitate do request them. Thanks in advance and kind regards, -- Frank Gruellich HPC-Techniker Tel.: +49 3722 528 42 Fax:+49 3722 528 15 E-Mail: frank.gruell...@megware.com MEGWARE Computer GmbH Vertrieb und Service Nordstrasse 19 09247 Chemnitz/Roehrsdorf Germany http://www.megware.com/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Thanks, Graham. -- Dr Graham E. Fagg | Distributed, Parallel and Meta-Computing Innovative Computing Lab. PVM3.4, HARNESS, FT-MPI, SNIPE & Open MPI Computer Science Dept | Suite 203, 1122 Volunteer Blvd, University of Tennessee | Knoxville, Tennessee, USA. TN 37996-3450 Email: f...@cs.utk.edu | Phone:+1(865)974-5790 | Fax:+1(865)974-8296 Broken complex systems are always derived from working simple systems --
Re: [OMPI users] SEGV in libopal during MPI_Alltoall
Frank, On the all-to-all collective the send and receive buffers has to be able to contain all the information you try to send. On this particular case, as you initialize the envio variable to a double I suppose it is defined as a double. If it's the case then the error is that the send operation will send more data than the amount available on the envio variable. If you want to be able to do correctly the all-to-all in your example, make sure the envio variable has a size at least equal to: tam * sizeof(byte) * NPROCS, where NPROCS is the number of procs available on the mpi_comm_world communicator. Moreover, the error messages seems to indicate that some memory registration failed. This can effectively be the send buffer. Thanks, George. On Wed, 19 Jul 2006, Frank Gruellich wrote: Hi, I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler 9.1. I get this error message during an MPI_Alltoall call: Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0x1cd04fe0 [0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75] [1] func:/lib64/libpthread.so.0 [0x2b569739b140] [2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) [0x2b5697278cf0] *** End of error message *** and have no idea about the problem. It arises if I exceed a specific number (10) of MPI nodes. The error occures in this code: do i = 1,npuntos print *,'puntos',i tam = 2**(i-1) tmin = 1e5 tavg = 0.0d0 do j = 1,rep envio = 8.0d0*j call mpi_barrier(mpi_comm_world,ierr) time1 = mpi_wtime() do k = 1,rep2 call mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr) end do call mpi_barrier(mpi_comm_world,ierr) time2 = mpi_wtime() time = (time2 - time1)/(rep2) if (time < tmin) tmin = time tavg = tavg + time end do m_tmin(i) = tmin m_tavg(i) = tavg/rep end do this code is said to be running on another system (running IBGD 1.8.x). I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar error message when using 13 nodes: forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PCRoutineLineSource libpthread.so.02B65DA39B140 Unknown Unknown Unknown main.out 00448BDB Unknown Unknown Unknown [9] Registration failed, file : intra_rdma_alltoall.c, line : 163 [6] Registration failed, file : intra_rdma_alltoall.c, line : 163 9 - MPI_ALLTOALL : Unknown error [9] [] Aborting Program! 6 - MPI_ALLTOALL : Unknown error [6] [] Aborting Program! [2] Registration failed, file : intra_rdma_alltoall.c, line : 163 [11] Registration failed, file : intra_rdma_alltoall.c, line : 163 11 - MPI_ALLTOALL : Unknown error [11] [] Aborting Program! 2 - MPI_ALLTOALL : Unknown error [2] [] Aborting Program! [10] Registration failed, file : intra_rdma_alltoall.c, line : 163 10 - MPI_ALLTOALL : Unknown error [10] [] Aborting Program! [5] Registration failed, file : intra_rdma_alltoall.c, line : 163 5 - MPI_ALLTOALL : Unknown error [5] [] Aborting Program! [3] Registration failed, file : intra_rdma_alltoall.c, line : 163 [8] Registration failed, file : intra_rdma_alltoall.c, line : 163 3 - MPI_ALLTOALL : Unknown error [3] [] Aborting Program! 8 - MPI_ALLTOALL : Unknown error [8] [] Aborting Program! [4] Registration failed, file : intra_rdma_alltoall.c, line : 163 4 - MPI_ALLTOALL : Unknown error [4] [] Aborting Program! [7] Registration failed, file : intra_rdma_alltoall.c, line : 163 7 - MPI_ALLTOALL : Unknown error [7] [] Aborting Program! [0] Registration failed, file : intra_rdma_alltoall.c, line : 163 0 - MPI_ALLTOALL : Unknown error [0] [] Aborting Program! [1] Registration failed, file : intra_rdma_alltoall.c, line : 163 1 - MPI_ALLTOALL : Unknown error [1] [] Aborting Program! I don't know wether this is a problem with MPI or Intel Compiler. Please, can anybody point me in the right direction what I could have done wrong? This is my first post (so be gentle) and at this time I'm not very used to the verbosity of this list, so if you need any further informations do not hesitate do request them. Thanks in advance and kind regards, "We must accept finite disappointment, but we must never lose infinite hope." Martin Luther King