Hi Frank
I am not sure which alltoall your using in 1.1 so can you please run the ompi_info utility which is normally built and put into the same directory as mpirun?

i.e. host% ompi_info

This provides lots of really usefull info on everything before we dig deeper into your issue


and then more specifically run
host%  ompi_info --param coll all

thanks
Graham



On Wed, 19 Jul 2006, Frank Gruellich wrote:

Hi,

I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler
9.1.  I get this error message during an MPI_Alltoall call:

Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x1cd04fe0
[0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75]
[1] func:/lib64/libpthread.so.0 [0x2b569739b140]
[2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) 
[0x2b5697278cf0]
*** End of error message ***

and have no idea about the problem.  It arises if I exceed a specific
number (10) of MPI nodes.  The error occures in this code:

 do i = 1,npuntos
   print *,'puntos',i
   tam = 2**(i-1)
   tmin = 1e5
   tavg = 0.0d0
   do j = 1,rep
     envio = 8.0d0*j
     call mpi_barrier(mpi_comm_world,ierr)
     time1 = mpi_wtime()
     do k = 1,rep2
       call 
mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr)
     end do
     call mpi_barrier(mpi_comm_world,ierr)
     time2 = mpi_wtime()
     time = (time2 - time1)/(rep2)
     if (time < tmin) tmin = time
     tavg = tavg + time
   end do
   m_tmin(i) = tmin
   m_tavg(i) = tavg/rep
 end do

this code is said to be running on another system (running IBGD 1.8.x).
I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar
error message when using 13 nodes:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
libpthread.so.0    00002B65DA39B140  Unknown               Unknown  Unknown
main.out           0000000000448BDB  Unknown               Unknown  Unknown
[9] Registration failed, file : intra_rdma_alltoall.c, line : 163
[6] Registration failed, file : intra_rdma_alltoall.c, line : 163
9 - MPI_ALLTOALL : Unknown error
[9] [] Aborting Program!
6 - MPI_ALLTOALL : Unknown error
[6] [] Aborting Program!
[2] Registration failed, file : intra_rdma_alltoall.c, line : 163
[11] Registration failed, file : intra_rdma_alltoall.c, line : 163
11 - MPI_ALLTOALL : Unknown error
[11] [] Aborting Program!
2 - MPI_ALLTOALL : Unknown error
[2] [] Aborting Program!
[10] Registration failed, file : intra_rdma_alltoall.c, line : 163
10 - MPI_ALLTOALL : Unknown error
[10] [] Aborting Program!
[5] Registration failed, file : intra_rdma_alltoall.c, line : 163
5 - MPI_ALLTOALL : Unknown error
[5] [] Aborting Program!
[3] Registration failed, file : intra_rdma_alltoall.c, line : 163
[8] Registration failed, file : intra_rdma_alltoall.c, line : 163
3 - MPI_ALLTOALL : Unknown error
[3] [] Aborting Program!
8 - MPI_ALLTOALL : Unknown error
[8] [] Aborting Program!
[4] Registration failed, file : intra_rdma_alltoall.c, line : 163
4 - MPI_ALLTOALL : Unknown error
[4] [] Aborting Program!
[7] Registration failed, file : intra_rdma_alltoall.c, line : 163
7 - MPI_ALLTOALL : Unknown error
[7] [] Aborting Program!
[0] Registration failed, file : intra_rdma_alltoall.c, line : 163
0 - MPI_ALLTOALL : Unknown error
[0] [] Aborting Program!
[1] Registration failed, file : intra_rdma_alltoall.c, line : 163
1 - MPI_ALLTOALL : Unknown error
[1] [] Aborting Program!

I don't know wether this is a problem with MPI or Intel Compiler.
Please, can anybody point me in the right direction what I could have
done wrong?  This is my first post (so be gentle) and at this time I'm
not very used to the verbosity of this list, so if you need any further
informations do not hesitate do request them.

Thanks in advance and kind regards,
--
Frank Gruellich
HPC-Techniker

Tel.:   +49 3722 528 42
Fax:    +49 3722 528 15
E-Mail: frank.gruell...@megware.com

MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Thanks,
        Graham.
----------------------------------------------------------------------
Dr Graham E. Fagg       | Distributed, Parallel and Meta-Computing
Innovative Computing Lab. PVM3.4, HARNESS, FT-MPI, SNIPE & Open MPI
Computer Science Dept   | Suite 203, 1122 Volunteer Blvd,
University of Tennessee | Knoxville, Tennessee, USA. TN 37996-3450
Email: f...@cs.utk.edu  | Phone:+1(865)974-5790 | Fax:+1(865)974-8296
Broken complex systems are always derived from working simple systems
----------------------------------------------------------------------

Reply via email to