[OMPI users] bug report: wrong reference in mpi.h to mpicxx.h

2006-07-19 Thread Paul Heinzlreiter
Hi all,

I'm not sure whether this bug has already been reported/fixed (maybe in
the v1.1.1 pre-release):

I've compiled and installed Open MPI Version 1.1 (stable), which worked
well.

for configuring OpenMPI I used the commandline

./configure --prefix=/home/ph/local/openmpi --disable-mpi-f77
--disable-mpi-f99

since i don't need fortran support.

Compiling and executing a simple MPI test program (in C) with Open MPI
also worked well.

After that I tried to compile VTK (http://www.vtk.org) with MPI support
using OpenMPI.

The compilation process issued the following error message:

/home/ph/local/openmpi/include/mpi.h:1757:33: ompi/mpi/cxx/mpicxx.h: No
such file or directory

and indeed the location of the file mpicxx.h is
/home/ph/local/openmpi/include/openmpi/ompi/mpi/cxx/mpicxx.h

and in mpi.h

it is stated

#if !defined(OMPI_SKIP_MPICXX) && OMPI_WANT_CXX_BINDINGS && !OMPI_BUILDING
#if defined(__cplusplus) || defined(c_plusplus)
#include "ompi/mpi/cxx/mpicxx.h"
#endif
#endif

so this would refer to the file

/home/ph/local/openmpi/include/ompi/mpi/cxx/mpicxx.h

as I see it.

so there is one subdirectory missing (openmpi) in the reference within
mpi.h.

Regards,
Paul Heinzlreiter



Re: [OMPI users] bug report: wrong reference in mpi.h to mpicxx.h

2006-07-19 Thread Sven Stork
Dear Paul, 

this previously posted "tutorial" how to build ParaView could maybe useful to 
you:

http://www.open-mpi.org/community/lists/users/2006/05/1246.php

regards,
Sven

On Wednesday 19 July 2006 14:57, Paul Heinzlreiter wrote:
> Hi all,
> 
> I'm not sure whether this bug has already been reported/fixed (maybe in
> the v1.1.1 pre-release):
> 
> I've compiled and installed Open MPI Version 1.1 (stable), which worked
> well.
> 
> for configuring OpenMPI I used the commandline
> 
> ./configure --prefix=/home/ph/local/openmpi --disable-mpi-f77
> --disable-mpi-f99
> 
> since i don't need fortran support.
> 
> Compiling and executing a simple MPI test program (in C) with Open MPI
> also worked well.
> 
> After that I tried to compile VTK (http://www.vtk.org) with MPI support
> using OpenMPI.
> 
> The compilation process issued the following error message:
> 
> /home/ph/local/openmpi/include/mpi.h:1757:33: ompi/mpi/cxx/mpicxx.h: No
> such file or directory
> 
> and indeed the location of the file mpicxx.h is
> /home/ph/local/openmpi/include/openmpi/ompi/mpi/cxx/mpicxx.h
> 
> and in mpi.h
> 
> it is stated
> 
> #if !defined(OMPI_SKIP_MPICXX) && OMPI_WANT_CXX_BINDINGS && !OMPI_BUILDING
> #if defined(__cplusplus) || defined(c_plusplus)
> #include "ompi/mpi/cxx/mpicxx.h"
> #endif
> #endif
> 
> so this would refer to the file
> 
> /home/ph/local/openmpi/include/ompi/mpi/cxx/mpicxx.h
> 
> as I see it.
> 
> so there is one subdirectory missing (openmpi) in the reference within
> mpi.h.
> 
> Regards,
> Paul Heinzlreiter
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Re: [OMPI users] bug report: wrong reference in mpi.h to mpicxx.h

2006-07-19 Thread Brian Barrett
On Wed, 2006-07-19 at 14:57 +0200, Paul Heinzlreiter wrote:

> After that I tried to compile VTK (http://www.vtk.org) with MPI support
> using OpenMPI.
> 
> The compilation process issued the following error message:
> 
> /home/ph/local/openmpi/include/mpi.h:1757:33: ompi/mpi/cxx/mpicxx.h: No
> such file or directory

Sven sent instructions on how to best build VTK, but I wanted to explain
what you are seeing.  Open MPI actually requires two -I options to use
the C++ bindings: -I/include and -I/include/openmpi.
Generally, the wrapper compilers (mpicc, mpiCC, mpif77, etc.) are used
to build Open MPI applications and the -I flags are automatically added
without any problem (a bunch of other flags that might be required on
your system may also be added).  

You can use the "mpiCC -showme" option to the wrapper compiler to see
excatly which flags it might add when compiling / linking / etc.


Hope this helps,

Brian



Re: [OMPI users] bug report: wrong reference in mpi.h to mpicxx.h

2006-07-19 Thread Paul Heinzlreiter
I just copied .../openmpi/include/openmpi/ompi/... to
../openmpi/include/ompi/ ...

and all went well, VTK was built together with it's testing tree
including MPI applications, using the g++/gcc compiler.

maybe it works with the mpi(CC/cc) compiler without moving directories
around.

you can only specify one compiler for the whole vtk source and most of
it is not mpi dependent.


Paul

Brian Barrett wrote:
> On Wed, 2006-07-19 at 14:57 +0200, Paul Heinzlreiter wrote:
> 
>> After that I tried to compile VTK (http://www.vtk.org) with MPI support
>> using OpenMPI.
>>
>> The compilation process issued the following error message:
>>
>> /home/ph/local/openmpi/include/mpi.h:1757:33: ompi/mpi/cxx/mpicxx.h: No
>> such file or directory
> 
> Sven sent instructions on how to best build VTK, but I wanted to explain
> what you are seeing.  Open MPI actually requires two -I options to use
> the C++ bindings: -I/include and -I/include/openmpi.
> Generally, the wrapper compilers (mpicc, mpiCC, mpif77, etc.) are used
> to build Open MPI applications and the -I flags are automatically added
> without any problem (a bunch of other flags that might be required on
> your system may also be added).  
> 
> You can use the "mpiCC -showme" option to the wrapper compiler to see
> excatly which flags it might add when compiling / linking / etc.
> 
> 
> Hope this helps,
> 
> Brian
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] SEGV in libopal during MPI_Alltoall

2006-07-19 Thread Frank Gruellich
Hi,

I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler
9.1.  I get this error message during an MPI_Alltoall call:

Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x1cd04fe0
[0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75]
[1] func:/lib64/libpthread.so.0 [0x2b569739b140]
[2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) 
[0x2b5697278cf0]
*** End of error message ***

and have no idea about the problem.  It arises if I exceed a specific
number (10) of MPI nodes.  The error occures in this code:

  do i = 1,npuntos
print *,'puntos',i
tam = 2**(i-1)
tmin = 1e5
tavg = 0.0d0
do j = 1,rep
  envio = 8.0d0*j
  call mpi_barrier(mpi_comm_world,ierr)
  time1 = mpi_wtime()
  do k = 1,rep2
call 
mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr)
  end do
  call mpi_barrier(mpi_comm_world,ierr)
  time2 = mpi_wtime()
  time = (time2 - time1)/(rep2)
  if (time < tmin) tmin = time
  tavg = tavg + time
end do
m_tmin(i) = tmin
m_tavg(i) = tavg/rep
  end do

this code is said to be running on another system (running IBGD 1.8.x).
I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar
error message when using 13 nodes:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image  PCRoutineLineSource
libpthread.so.02B65DA39B140  Unknown   Unknown  Unknown
main.out   00448BDB  Unknown   Unknown  Unknown
[9] Registration failed, file : intra_rdma_alltoall.c, line : 163
[6] Registration failed, file : intra_rdma_alltoall.c, line : 163
9 - MPI_ALLTOALL : Unknown error
[9] [] Aborting Program!
6 - MPI_ALLTOALL : Unknown error
[6] [] Aborting Program!
[2] Registration failed, file : intra_rdma_alltoall.c, line : 163
[11] Registration failed, file : intra_rdma_alltoall.c, line : 163
11 - MPI_ALLTOALL : Unknown error
[11] [] Aborting Program!
2 - MPI_ALLTOALL : Unknown error
[2] [] Aborting Program!
[10] Registration failed, file : intra_rdma_alltoall.c, line : 163
10 - MPI_ALLTOALL : Unknown error
[10] [] Aborting Program!
[5] Registration failed, file : intra_rdma_alltoall.c, line : 163
5 - MPI_ALLTOALL : Unknown error
[5] [] Aborting Program!
[3] Registration failed, file : intra_rdma_alltoall.c, line : 163
[8] Registration failed, file : intra_rdma_alltoall.c, line : 163
3 - MPI_ALLTOALL : Unknown error
[3] [] Aborting Program!
8 - MPI_ALLTOALL : Unknown error
[8] [] Aborting Program!
[4] Registration failed, file : intra_rdma_alltoall.c, line : 163
4 - MPI_ALLTOALL : Unknown error
[4] [] Aborting Program!
[7] Registration failed, file : intra_rdma_alltoall.c, line : 163
7 - MPI_ALLTOALL : Unknown error
[7] [] Aborting Program!
[0] Registration failed, file : intra_rdma_alltoall.c, line : 163
0 - MPI_ALLTOALL : Unknown error
[0] [] Aborting Program!
[1] Registration failed, file : intra_rdma_alltoall.c, line : 163
1 - MPI_ALLTOALL : Unknown error
[1] [] Aborting Program!

I don't know wether this is a problem with MPI or Intel Compiler.
Please, can anybody point me in the right direction what I could have
done wrong?  This is my first post (so be gentle) and at this time I'm
not very used to the verbosity of this list, so if you need any further
informations do not hesitate do request them.

Thanks in advance and kind regards,
-- 
Frank Gruellich
HPC-Techniker

Tel.:   +49 3722 528 42
Fax:+49 3722 528 15
E-Mail: frank.gruell...@megware.com

MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/


Re: [OMPI users] SEGV in libopal during MPI_Alltoall

2006-07-19 Thread Graham E Fagg

Hi Frank
 I am not sure which alltoall your using in 1.1 so can you please run 
the ompi_info utility which is normally built and put into the same 
directory as mpirun?


i.e. host% ompi_info

This provides lots of really usefull info on everything before we dig 
deeper into your issue



and then more specifically run
host%  ompi_info --param coll all

thanks
Graham



On Wed, 19 Jul 2006, Frank Gruellich wrote:


Hi,

I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler
9.1.  I get this error message during an MPI_Alltoall call:

Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x1cd04fe0
[0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75]
[1] func:/lib64/libpthread.so.0 [0x2b569739b140]
[2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) 
[0x2b5697278cf0]
*** End of error message ***

and have no idea about the problem.  It arises if I exceed a specific
number (10) of MPI nodes.  The error occures in this code:

 do i = 1,npuntos
   print *,'puntos',i
   tam = 2**(i-1)
   tmin = 1e5
   tavg = 0.0d0
   do j = 1,rep
 envio = 8.0d0*j
 call mpi_barrier(mpi_comm_world,ierr)
 time1 = mpi_wtime()
 do k = 1,rep2
   call 
mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr)
 end do
 call mpi_barrier(mpi_comm_world,ierr)
 time2 = mpi_wtime()
 time = (time2 - time1)/(rep2)
 if (time < tmin) tmin = time
 tavg = tavg + time
   end do
   m_tmin(i) = tmin
   m_tavg(i) = tavg/rep
 end do

this code is said to be running on another system (running IBGD 1.8.x).
I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar
error message when using 13 nodes:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image  PCRoutineLineSource
libpthread.so.02B65DA39B140  Unknown   Unknown  Unknown
main.out   00448BDB  Unknown   Unknown  Unknown
[9] Registration failed, file : intra_rdma_alltoall.c, line : 163
[6] Registration failed, file : intra_rdma_alltoall.c, line : 163
9 - MPI_ALLTOALL : Unknown error
[9] [] Aborting Program!
6 - MPI_ALLTOALL : Unknown error
[6] [] Aborting Program!
[2] Registration failed, file : intra_rdma_alltoall.c, line : 163
[11] Registration failed, file : intra_rdma_alltoall.c, line : 163
11 - MPI_ALLTOALL : Unknown error
[11] [] Aborting Program!
2 - MPI_ALLTOALL : Unknown error
[2] [] Aborting Program!
[10] Registration failed, file : intra_rdma_alltoall.c, line : 163
10 - MPI_ALLTOALL : Unknown error
[10] [] Aborting Program!
[5] Registration failed, file : intra_rdma_alltoall.c, line : 163
5 - MPI_ALLTOALL : Unknown error
[5] [] Aborting Program!
[3] Registration failed, file : intra_rdma_alltoall.c, line : 163
[8] Registration failed, file : intra_rdma_alltoall.c, line : 163
3 - MPI_ALLTOALL : Unknown error
[3] [] Aborting Program!
8 - MPI_ALLTOALL : Unknown error
[8] [] Aborting Program!
[4] Registration failed, file : intra_rdma_alltoall.c, line : 163
4 - MPI_ALLTOALL : Unknown error
[4] [] Aborting Program!
[7] Registration failed, file : intra_rdma_alltoall.c, line : 163
7 - MPI_ALLTOALL : Unknown error
[7] [] Aborting Program!
[0] Registration failed, file : intra_rdma_alltoall.c, line : 163
0 - MPI_ALLTOALL : Unknown error
[0] [] Aborting Program!
[1] Registration failed, file : intra_rdma_alltoall.c, line : 163
1 - MPI_ALLTOALL : Unknown error
[1] [] Aborting Program!

I don't know wether this is a problem with MPI or Intel Compiler.
Please, can anybody point me in the right direction what I could have
done wrong?  This is my first post (so be gentle) and at this time I'm
not very used to the verbosity of this list, so if you need any further
informations do not hesitate do request them.

Thanks in advance and kind regards,
--
Frank Gruellich
HPC-Techniker

Tel.:   +49 3722 528 42
Fax:+49 3722 528 15
E-Mail: frank.gruell...@megware.com

MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Thanks,
Graham.
--
Dr Graham E. Fagg   | Distributed, Parallel and Meta-Computing
Innovative Computing Lab. PVM3.4, HARNESS, FT-MPI, SNIPE & Open MPI
Computer Science Dept   | Suite 203, 1122 Volunteer Blvd,
University of Tennessee | Knoxville, Tennessee, USA. TN 37996-3450
Email: f...@cs.utk.edu  | Phone:+1(865)974-5790 | Fax:+1(865)974-8296
Broken complex systems are always derived from working simple systems
--


Re: [OMPI users] SEGV in libopal during MPI_Alltoall

2006-07-19 Thread George Bosilca

Frank,

On the all-to-all collective the send and receive buffers has to be able 
to contain all the information you try to send. On this particular case, 
as you initialize the envio variable to a double I suppose it is defined 
as a double. If it's the case then the error is that the send operation 
will send more data than the amount available on the envio variable.


If you want to be able to do correctly the all-to-all in your example, 
make sure the envio variable has a size at least equal to:
tam * sizeof(byte) * NPROCS, where NPROCS is the number of procs available 
on the mpi_comm_world communicator.


Moreover, the error messages seems to indicate that some memory 
registration failed. This can effectively be the send buffer.


  Thanks,
George.


On Wed, 19 Jul 2006, Frank Gruellich wrote:


Hi,

I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler
9.1.  I get this error message during an MPI_Alltoall call:

Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x1cd04fe0
[0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75]
[1] func:/lib64/libpthread.so.0 [0x2b569739b140]
[2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) 
[0x2b5697278cf0]
*** End of error message ***

and have no idea about the problem.  It arises if I exceed a specific
number (10) of MPI nodes.  The error occures in this code:

 do i = 1,npuntos
   print *,'puntos',i
   tam = 2**(i-1)
   tmin = 1e5
   tavg = 0.0d0
   do j = 1,rep
 envio = 8.0d0*j
 call mpi_barrier(mpi_comm_world,ierr)
 time1 = mpi_wtime()
 do k = 1,rep2
   call 
mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr)
 end do
 call mpi_barrier(mpi_comm_world,ierr)
 time2 = mpi_wtime()
 time = (time2 - time1)/(rep2)
 if (time < tmin) tmin = time
 tavg = tavg + time
   end do
   m_tmin(i) = tmin
   m_tavg(i) = tavg/rep
 end do

this code is said to be running on another system (running IBGD 1.8.x).
I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar
error message when using 13 nodes:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image  PCRoutineLineSource
libpthread.so.02B65DA39B140  Unknown   Unknown  Unknown
main.out   00448BDB  Unknown   Unknown  Unknown
[9] Registration failed, file : intra_rdma_alltoall.c, line : 163
[6] Registration failed, file : intra_rdma_alltoall.c, line : 163
9 - MPI_ALLTOALL : Unknown error
[9] [] Aborting Program!
6 - MPI_ALLTOALL : Unknown error
[6] [] Aborting Program!
[2] Registration failed, file : intra_rdma_alltoall.c, line : 163
[11] Registration failed, file : intra_rdma_alltoall.c, line : 163
11 - MPI_ALLTOALL : Unknown error
[11] [] Aborting Program!
2 - MPI_ALLTOALL : Unknown error
[2] [] Aborting Program!
[10] Registration failed, file : intra_rdma_alltoall.c, line : 163
10 - MPI_ALLTOALL : Unknown error
[10] [] Aborting Program!
[5] Registration failed, file : intra_rdma_alltoall.c, line : 163
5 - MPI_ALLTOALL : Unknown error
[5] [] Aborting Program!
[3] Registration failed, file : intra_rdma_alltoall.c, line : 163
[8] Registration failed, file : intra_rdma_alltoall.c, line : 163
3 - MPI_ALLTOALL : Unknown error
[3] [] Aborting Program!
8 - MPI_ALLTOALL : Unknown error
[8] [] Aborting Program!
[4] Registration failed, file : intra_rdma_alltoall.c, line : 163
4 - MPI_ALLTOALL : Unknown error
[4] [] Aborting Program!
[7] Registration failed, file : intra_rdma_alltoall.c, line : 163
7 - MPI_ALLTOALL : Unknown error
[7] [] Aborting Program!
[0] Registration failed, file : intra_rdma_alltoall.c, line : 163
0 - MPI_ALLTOALL : Unknown error
[0] [] Aborting Program!
[1] Registration failed, file : intra_rdma_alltoall.c, line : 163
1 - MPI_ALLTOALL : Unknown error
[1] [] Aborting Program!

I don't know wether this is a problem with MPI or Intel Compiler.
Please, can anybody point me in the right direction what I could have
done wrong?  This is my first post (so be gentle) and at this time I'm
not very used to the verbosity of this list, so if you need any further
informations do not hesitate do request them.

Thanks in advance and kind regards,



"We must accept finite disappointment, but we must never lose infinite
hope."
  Martin Luther King