Re: [OMPI users] Summary of OMPI on OS X with Intel

2006-07-20 Thread Jeff Squyres
On 7/18/06 7:33 PM, "Warner Yuen"  wrote:

> USING GCC 4.0.1 (build 5341) with and without Intel Fortran (build
> 9.1.027):

What version of Open MPI were you working with?  If it was a developer/SVN
checkout, what version of the GNU Auto tools were you using?

> Config #2: ./configure --disable-shared --enable-static --with-rsh=/
> usr/bin/ssh
> Successful Build = NO
> Error:
> g++ -O3 -DNDEBUG -finline-functions -Wl,-u -Wl,_munmap -Wl,-
> multiply_defined -Wl,suppress -o ompi_info components.o ompi_info.o
> output.o param.o version.o -Wl,-bind_at_load  ../../../ompi/.libs/
> libmpi.a /Users/wyuen/mpi_src/openmpi-1.1/orte/.libs/liborte.a /Users/
> wyuen/mpi_src/openmpi-1.1/opal/.libs/libopal.a -ldl
> /usr/bin/ld: Undefined symbols:
> _mpi_fortran_status_ignore_
> _mpi_fortran_statuses_ignore_

Do you have a fortran compiler at all?  If so, which one?  Please send the
full output from configure, config.log, and the output from make (stdout and
stderr).

> USING Intel C (build 9.1.027) and with and without Intel Fortran
> (build 9.1.027)
> 
> Config #4: ./configure --disable-mpi-f77 --disable-mpi-f90 --with-
> rsh=/usr/bin/ssh
> Successful Build = NO
> Error:
> IPO link: can not find "1"
> icc: error: problem during multi-file optimization compilation (code 1)
> make[2]: *** [libopal.la] Error 1
> make[1]: *** [all-recursive] Error 1
> make: *** [all-recursive] Error 1

This *looks* like a libtool problem.  Can you send the full configure
output, config.log, and full output from "make"?

> Config #6: ./configure --disable-shared --enable-static --with-rsh=/
> usr/bin/ssh
> Successful Build = NO
> Error:
> _mpi_fortran_statuses_ignore_

I suspect that this is the same problem as #2.

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


Re: [OMPI users] SEGV in libopal during MPI_Alltoall

2006-07-20 Thread shen T.T.
I have the same error message:"forrtl: severe (174): SIGSEGV, segmentation 
fault occurred". I run my paralled code on single node or multi nodes, the 
error existes. Then i try three Intel compilers : 8.1.037, 9.0.032 and 9.1.033 
, but the error still existes. But my code work correctly on Windows XP with 
Visual Fortran 6.6. I doubt whether the intel compiler has some bugs. I also 
try to solve the problem, maybe the bug is in my code.

  Do you have the other compiler? Could you check the error and report it ?

  T.T. Shen


Frank Gruellich  說:
  Hi,

I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler
9.1. I get this error message during an MPI_Alltoall call:

Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x1cd04fe0
[0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75]
[1] func:/lib64/libpthread.so.0 [0x2b569739b140]
[2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) 
[0x2b5697278cf0]
*** End of error message ***

and have no idea about the problem. It arises if I exceed a specific
number (10) of MPI nodes. The error occures in this code:

do i = 1,npuntos
print *,'puntos',i
tam = 2**(i-1)
tmin = 1e5
tavg = 0.0d0
do j = 1,rep
envio = 8.0d0*j
call mpi_barrier(mpi_comm_world,ierr)
time1 = mpi_wtime()
do k = 1,rep2
call mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr)
end do
call mpi_barrier(mpi_comm_world,ierr)
time2 = mpi_wtime()
time = (time2 - time1)/(rep2)
if (time < tmin) tmin = time
tavg = tavg + time
end do
m_tmin(i) = tmin
m_tavg(i) = tavg/rep
end do

this code is said to be running on another system (running IBGD 1.8.x).
I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar
error message when using 13 nodes:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread.so.0 2B65DA39B140 Unknown Unknown Unknown
main.out 00448BDB Unknown Unknown Unknown
[9] Registration failed, file : intra_rdma_alltoall.c, line : 163
[6] Registration failed, file : intra_rdma_alltoall.c, line : 163
9 - MPI_ALLTOALL : Unknown error
[9] [] Aborting Program!
6 - MPI_ALLTOALL : Unknown error
[6] [] Aborting Program!
[2] Registration failed, file : intra_rdma_alltoall.c, line : 163
[11] Registration failed, file : intra_rdma_alltoall.c, line : 163
11 - MPI_ALLTOALL : Unknown error
[11] [] Aborting Program!
2 - MPI_ALLTOALL : Unknown error
[2] [] Aborting Program!
[10] Registration failed, file : intra_rdma_alltoall.c, line : 163
10 - MPI_ALLTOALL : Unknown error
[10] [] Aborting Program!
[5] Registration failed, file : intra_rdma_alltoall.c, line : 163
5 - MPI_ALLTOALL : Unknown error
[5] [] Aborting Program!
[3] Registration failed, file : intra_rdma_alltoall.c, line : 163
[8] Registration failed, file : intra_rdma_alltoall.c, line : 163
3 - MPI_ALLTOALL : Unknown error
[3] [] Aborting Program!
8 - MPI_ALLTOALL : Unknown error
[8] [] Aborting Program!
[4] Registration failed, file : intra_rdma_alltoall.c, line : 163
4 - MPI_ALLTOALL : Unknown error
[4] [] Aborting Program!
[7] Registration failed, file : intra_rdma_alltoall.c, line : 163
7 - MPI_ALLTOALL : Unknown error
[7] [] Aborting Program!
[0] Registration failed, file : intra_rdma_alltoall.c, line : 163
0 - MPI_ALLTOALL : Unknown error
[0] [] Aborting Program!
[1] Registration failed, file : intra_rdma_alltoall.c, line : 163
1 - MPI_ALLTOALL : Unknown error
[1] [] Aborting Program!

I don't know wether this is a problem with MPI or Intel Compiler.
Please, can anybody point me in the right direction what I could have
done wrong? This is my first post (so be gentle) and at this time I'm
not very used to the verbosity of this list, so if you need any further
informations do not hesitate do request them.

Thanks in advance and kind regards,
-- 
Frank Gruellich
HPC-Techniker

Tel.: +49 3722 528 42
Fax: +49 3722 528 15
E-Mail: frank.gruell...@megware.com

MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


 ___ 
 最新版 Yahoo!奇摩即時通訊 7.0,免費網路電話任你打! 
 http://messenger.yahoo.com.tw/

[OMPI users] OPEN_MPI with Intel Compiler -regards

2006-07-20 Thread esaifu
Dear All,
  I was able to compile OpenMPI and create wrapper functions(like 
mpicc,mpif77,etc) on top of GNU compilers.But when i tried it with Intel 
fortran compiler(Since i need f90 compiler also),i met with some configuration 
error(Hence i did'nt ger the Makefile).I am here with attching the error file 
which i am getting while configuring the source code.What could be the 
problem.Thanks in advance.
Regards
Saifudheen
esa...@hcl.in

openmpi_intel_error
Description: Binary data


[OMPI users] OpenMPI v/s( MPICH,LAM/MPI)

2006-07-20 Thread esaifu
Dear All,
 I have been using openMPI for the last one month,so i need some clarification 
regrding the following points.
  1). What is the advantage of OpenMPI over MPICH2 and LAM/MPI.I mean to say is 
there any difference in performace wise.
 2). Any check pointing mechanism is there in OpenMPI like it is there in 
LAM/MPI.
 3) .Can i port the openMPI in any of the platform(X86,X86-64,ia64).

Regards
Saifu

Re: [OMPI users] SEGV in libopal during MPI_Alltoall

2006-07-20 Thread Frank Gruellich
Hi,

Graham E Fagg wrote:
>  I am not sure which alltoall your using in 1.1 so can you please run
> the ompi_info utility which is normally built and put into the same
> directory as mpirun?
>
> i.e. host% ompi_info
>
> This provides lots of really usefull info on everything before we dig
> deeper into your issue
>
>
> and then more specifically run
> host%  ompi_info --param coll all

Find attached ~/notes from

 $ ( ompi_info; echo '='; ompi_info --param coll all ) 
>~/notes

Thanks in advance and kind regards,
-- 
Frank Gruellich
HPC-Techniker

Tel.:   +49 3722 528 42
Fax:+49 3722 528 15
E-Mail: frank.gruell...@megware.com

MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/
Open MPI: 1.1b1
   Open MPI SVN revision: r10217
Open RTE: 1.1b1
   Open RTE SVN revision: r10217
OPAL: 1.1b1
   OPAL SVN revision: r10217
  Prefix: /usr/ofed/mpi/intel/openmpi-1.1b1-1
 Configured architecture: x86_64-suse-linux-gnu
   Configured by: root
   Configured on: Wed Jul 19 20:51:46 CEST 2006
  Configure host: frontend
Built by: root
Built on: Wed Jul 19 21:04:47 CEST 2006
  Built host: frontend
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: icc
 C compiler absolute: /software/intel/cce/9.1.038/bin/icc
C++ compiler: icpc
   C++ compiler absolute: /software/intel/cce/9.1.038/bin/icpc
  Fortran77 compiler: ifort
  Fortran77 compiler abs: /software/intel/fce/9.1.032/bin/ifort
  Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
   MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1)
   MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
   MCA mpool: openib (MCA v1.0, API v1.0, Component v1.1)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
 MCA pml: dr (MCA v1.0, API v1.0, Component v1.1)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
  MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
 MCA btl: openib (MCA v1.0, API v1.0, Component v1.1)
 MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
  MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)
 MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)
 MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)
 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1)
 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
   MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1)
MCA rmgr: p

Re: [OMPI users] SEGV in libopal during MPI_Alltoall

2006-07-20 Thread Frank Gruellich
Hi,

shen T.T. wrote:
>   Do you have the other compiler? Could you check the error and report it ?

I don't used other Intel Compilers at the moment, but I'm going to give
gfortran a try today.

Kind regards,
-- 
Frank Gruellich
HPC-Techniker

Tel.:   +49 3722 528 42
Fax:+49 3722 528 15
E-Mail: frank.gruell...@megware.com

MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/


Re: [OMPI users] SEGV in libopal during MPI_Alltoall

2006-07-20 Thread Frank Gruellich
Hi,

George Bosilca wrote:
> On the all-to-all collective the send and receive buffers has to be able
> to contain all the information you try to send. On this particular case,
> as you initialize the envio variable to a double I suppose it is defined
> as a double. If it's the case then the error is that the send operation
> will send more data than the amount available on the envio variable.
>
> If you want to be able to do correctly the all-to-all in your example,
> make sure the envio variable has a size at least equal to:
> tam * sizeof(byte) * NPROCS, where NPROCS is the number of procs available
> on the mpi_comm_world communicator.

I'm unfortunately not that Fortran guy.  Maybe the best would be to
submit the whole function at the beginning, it's neither secret nor big:

module alltoall
  use globales
  implicit none

contains
subroutine All_to_all

  integer,parameter :: npuntos = 24
  integer,parameter :: t_max = 2**(npuntos-1)
  integer siguiente,anterior,tam,rep,p_1,p_2,i,j,ndatos,rep2,o,k
  double precision time1,time2,time,ov,tmin,tavg
  double precision,dimension(t_max)::envio
  double precision,dimension(:),allocatable::recibe
  double precision,dimension(npuntos)::m_tmin,m_tavg
  double precision,dimension(npuntos)::tams

  rep2 = 10
  tag1 = 1
  tag2 = 2
  rep = 3

  allocate(recibe(t_max*nproc))
  siguiente = my_id + 1
  if (my_id == nproc -1) siguiente = 0
  anterior = my_id - 1
  if (my_id == 0) anterior = nproc- 1

  do i = 1,npuntos
print *,'puntos',i
tam = 2**(i-1)
tmin = 1e5
tavg = 0.0d0
do j = 1,rep
  envio = 8.0d0*j
  call mpi_barrier(mpi_comm_world,ierr)
  time1 = mpi_wtime()
  do k = 1,rep2
call 
mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr)
  end do
  call mpi_barrier(mpi_comm_world,ierr)
  time2 = mpi_wtime()
  time = (time2 - time1)/(rep2)
  if (time < tmin) tmin = time
  tavg = tavg + time
end do
m_tmin(i) = tmin
m_tavg(i) = tavg/rep
  end do
  call mpi_barrier(mpi_comm_world,ierr)
  print *,"acaba"

  if (my_id == 0) then
open (1,file='Alltoall.dat')
write (1,*) "#Prueba All to all entre todos los procesadores(",nproc,")"
write (1,*) "#Precision del reloj:",mpi_wtick()*1.0d6,"(muS)"
do i =1,npuntos
  write(1,900) 2*nproc*2**(i-1),m_tmin(i),m_tavg(i)!,ov
end do
close(1)
  end if
  900 FORMAT(I10,F14.8,F14.8)
  800 FORMAT(I10,F14.8,F14.8)
end subroutine
end module

Can you read this?  (Sorry, I can't.) But the size_of envio seems to be
2**32 = 8388608 doubles, isn't it?  I don't understand, why it should
depend on the number of MPI nodes, as you said.

Thanks for your help.  Kind regards,
-- 
Frank Gruellich
HPC-Techniker

Tel.:   +49 3722 528 42
Fax:+49 3722 528 15
E-Mail: frank.gruell...@megware.com

MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/


Re: [OMPI users] SEGV in libopal during MPI_Alltoall

2006-07-20 Thread George Bosilca
It is what I suspected. You can see that the envio array is smaller than 
it should. It was created as an array of doubles with the size t_max, when 
it should have been created as an array of double with the size t_max * 
nprocs. If you look how the recibe array is created you can notice that 
it's size if t_max * nprocs (allocate(recibe(t_max*nproc))). As on the 
all-to-all operation everybody send and receive exactly the same amount of 
data, both the send and receive array should have the same size.


I propose the following fix:

- instead of

 double precision,dimension(t_max)::envio
 double precision,dimension(:),allocatable::recibe

do a

 double precision,dimension(:),allocatable::envio
 double precision,dimension(:),allocatable::recibe


- then, when the recibe array is created add the allocation for envio too

 allocate(recibe(t_max*nproc))
 allocate(envio(t_max*nproc))


Now your program should work just fine.

  george.

On Thu, 20 Jul 2006, Frank Gruellich wrote:


Hi,

George Bosilca wrote:

On the all-to-all collective the send and receive buffers has to be able
to contain all the information you try to send. On this particular case,
as you initialize the envio variable to a double I suppose it is defined
as a double. If it's the case then the error is that the send operation
will send more data than the amount available on the envio variable.

If you want to be able to do correctly the all-to-all in your example,
make sure the envio variable has a size at least equal to:
tam * sizeof(byte) * NPROCS, where NPROCS is the number of procs available
on the mpi_comm_world communicator.


I'm unfortunately not that Fortran guy.  Maybe the best would be to
submit the whole function at the beginning, it's neither secret nor big:

module alltoall
 use globales
 implicit none

contains
subroutine All_to_all

 integer,parameter :: npuntos = 24
 integer,parameter :: t_max = 2**(npuntos-1)
 integer siguiente,anterior,tam,rep,p_1,p_2,i,j,ndatos,rep2,o,k
 double precision time1,time2,time,ov,tmin,tavg
 double precision,dimension(t_max)::envio
 double precision,dimension(:),allocatable::recibe
 double precision,dimension(npuntos)::m_tmin,m_tavg
 double precision,dimension(npuntos)::tams

 rep2 = 10
 tag1 = 1
 tag2 = 2
 rep = 3

 allocate(recibe(t_max*nproc))
 siguiente = my_id + 1
 if (my_id == nproc -1) siguiente = 0
 anterior = my_id - 1
 if (my_id == 0) anterior = nproc- 1

 do i = 1,npuntos
   print *,'puntos',i
   tam = 2**(i-1)
   tmin = 1e5
   tavg = 0.0d0
   do j = 1,rep
 envio = 8.0d0*j
 call mpi_barrier(mpi_comm_world,ierr)
 time1 = mpi_wtime()
 do k = 1,rep2
   call 
mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr)
 end do
 call mpi_barrier(mpi_comm_world,ierr)
 time2 = mpi_wtime()
 time = (time2 - time1)/(rep2)
 if (time < tmin) tmin = time
 tavg = tavg + time
   end do
   m_tmin(i) = tmin
   m_tavg(i) = tavg/rep
 end do
 call mpi_barrier(mpi_comm_world,ierr)
 print *,"acaba"

 if (my_id == 0) then
   open (1,file='Alltoall.dat')
   write (1,*) "#Prueba All to all entre todos los procesadores(",nproc,")"
   write (1,*) "#Precision del reloj:",mpi_wtick()*1.0d6,"(muS)"
   do i =1,npuntos
 write(1,900) 2*nproc*2**(i-1),m_tmin(i),m_tavg(i)!,ov
   end do
   close(1)
 end if
 900 FORMAT(I10,F14.8,F14.8)
 800 FORMAT(I10,F14.8,F14.8)
end subroutine
end module

Can you read this?  (Sorry, I can't.) But the size_of envio seems to be
2**32 = 8388608 doubles, isn't it?  I don't understand, why it should
depend on the number of MPI nodes, as you said.

Thanks for your help.  Kind regards,



"We must accept finite disappointment, but we must never lose infinite
hope."
  Martin Luther King



Re: [OMPI users] Summary of OMPI on OS X with Intel

2006-07-20 Thread Jeff Squyres
On 7/20/06 12:04 AM, "Jeff Squyres"  wrote:

>> Config #2: ./configure --disable-shared --enable-static --with-rsh=/
>> usr/bin/ssh
>> Successful Build = NO
>> Error:
>> g++ -O3 -DNDEBUG -finline-functions -Wl,-u -Wl,_munmap -Wl,-
>> multiply_defined -Wl,suppress -o ompi_info components.o ompi_info.o
>> output.o param.o version.o -Wl,-bind_at_load  ../../../ompi/.libs/
>> libmpi.a /Users/wyuen/mpi_src/openmpi-1.1/orte/.libs/liborte.a /Users/
>> wyuen/mpi_src/openmpi-1.1/opal/.libs/libopal.a -ldl
>> /usr/bin/ld: Undefined symbols:
>> _mpi_fortran_status_ignore_
>> _mpi_fortran_statuses_ignore_
> 
> Do you have a fortran compiler at all?  If so, which one?  Please send the
> full output from configure, config.log, and the output from make (stdout and
> stderr).

I was able to replicate this one (which, even though I don't have the Intel
compilers for OSX/intel, I'm pretty sure is the same issue as #6).  I
believe that this error will occur regardless of whether you include F77
support or not.

I'm pretty sure that I have a fix for this; I think it's an
OSX-linker-specific problem.  It'll probably hit the trunk and the v1.1
branch later today.

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


Re: [OMPI users] OPEN_MPI with Intel Compiler -regards

2006-07-20 Thread Jeff Squyres
Could you re-send that?  The attachment that I got was an excel spreadsheet
with the output from configure that did not show any errors -- it just
stopped in the middle of the check for "bool" in the C++ compiler.

Two notes:

1. One common mistake that people make is to use the "icc" compiler for the
C++ compiler.  Recent versions of the Intel compiler renamed the C++
compiler to be "icpc".  If your version of the Intel compiler has an "icpc",
you need to use that for the C++ compiler.

2. We had some problems with the Intel 8.1 compiler at one point -- it would
seg fault while compiling legal C code.  I think that later builds of the
Intel 8.1 compiler fixed the problem, however.  You might want to check that
you have the latest build of the 8.1 compiler.

If these two suggestions don't help, please see the "Getting help" web page
to see what information we need to help with compile problems.  Thanks!

http://www.open-mpi.org/community/help/


On 7/20/06 2:00 AM, "esaifu"  wrote:

> Dear All,
>   I was able to compile OpenMPI and create wrapper functions(like
> mpicc,mpif77,etc) on top of GNU compilers.But when i tried it with Intel
> fortran compiler(Since i need f90 compiler also),i met with some configuration
> error(Hence i did'nt ger the Makefile).I am here with attching the error file
> which i am getting while configuring the source code.What could be the
> problem.Thanks in advance.
> Regards
> Saifudheen
> esa...@hcl.in
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


Re: [OMPI users] OpenMPI v/s( MPICH,LAM/MPI)

2006-07-20 Thread Jeff Squyres
On 7/20/06 2:06 AM, "esaifu"  wrote:

>  I have been using openMPI for the last one month,so i need some clarification
> regrding the following points.
>   1). What is the advantage of OpenMPI over MPICH2 and LAM/MPI.I mean to say
> is there any difference in performace wise.

Open MPI's TCP performance is still a bit sub-par (because until only
recently, none of us had gotten around to optimizing it).  It is probably
below MPICH's TCP performance and definitely below LAM's TCP performance.

There are still a few features that have not yet been ported to OMPI from
LAM (we're working on them -- TCP performance is one of them).  But even
with those missing features, I consider OMPI to be a superior product than
LAM/MPI -- I have switched all my day-to-day MPI applications to use Open
MPI (instead of LAM).  Indeed, since Open MPI was designed and built by --
among many others -- the LAM/MPI crew, it contains most of the great ideas
from LAM and is therefore (in my mind, at least ;-) ) a worthy successor.

I can say all these things about LAM because I was the technical lead for it
for many years, and therefore have pretty good insight in the comparison of
the two.

One of the main advantages of Open MPI is that it has different goals than
other MPI implementations.  Open MPI aims to be a production-quality
software, is fully open source, strikes a good balance of cutting-edge
research and stability, and actively invites others to join in the process.
While there will always be bugs and "but with specific metric ABC,  performs better than Open MPI!", we feel that the
above are critical characteristics that distinguish Open MPI from other
projects.

>  2). Any check pointing mechanism is there in OpenMPI like it is there in
> LAM/MPI.

Not yet.  Work is actively progressing on this front.  Search the mailing
list archives for mails from Josh Hursey for more details on this.  The
short version is that we will have a demonstratable version of
checkpoint/restart at SC'06 (although it is highly unlikely that it will be
included in a stable release by then).  The checkpoint/restart work that we
are doing in OMPI will far surpass what we did in LAM/MPI.

>  3) .Can i port the openMPI in any of the platform(X86,X86-64,ia64).

Yes.

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


[OMPI users] MPI applicability

2006-07-20 Thread Vladimir Sipos
Hi, 

Is MPI paradigm applicable to the cluster of regular networked machines.
That is, does the cost of network IO offset benefits of parallelization?
My guess is that this really depends on the application itself, however,
I'm wondering if you guys know of any success stories which involve MPI
running on a set of networked machines (not beowulf cluster or any SC).

Thanks,

Vladimir Sipos
Software Engineer
Advertising Technology 
CNET Networks, Inc.







Re: [OMPI users] MPI applicability

2006-07-20 Thread Brock Palen
Its doable, the scaling will not as good, because a network is a  
network.  If you are using just regular 100Mbit,  you will not scale  
as far as really good 1gig ethernet, but we are still talking about  
tcp which incurs a penalty over networks like infiniband and myrinet.
Tcp is the largest issue, its going to be really application  
dependent you are right.
On another note though many of the older cluster that are now out of  
service used just 100Mbit ethernet and worked.


Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Jul 20, 2006, at 9:27 AM, Vladimir Sipos wrote:


Hi,

Is MPI paradigm applicable to the cluster of regular networked  
machines.
That is, does the cost of network IO offset benefits of  
parallelization?
My guess is that this really depends on the application itself,  
however,
I'm wondering if you guys know of any success stories which involve  
MPI
running on a set of networked machines (not beowulf cluster or any  
SC).


Thanks,

Vladimir Sipos
Software Engineer
Advertising Technology
CNET Networks, Inc.





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] MPI applicability

2006-07-20 Thread Jeff Squyres
I think there are two questions here:

1. Running MPI applications on "slow" networks (e.g., 100mbps).  This is
very much application-dependent. If your MPI app doesn't communication with
other processes much, then it probably won't matter.  If you have
latency/bandwidth-sensitive applications, then using a "slow" network can
definitely have a negative impact on performance.

2. Running MPI applications on resources that are being used by others.  In
this case, your MPI processes will be competing with other processes for
CPU, RAM, and other resources -- just like any other process.  Hence, your
overall performance will depend not only on the application, but also on the
usage patterns of the other resources (e.g., the workstations and the people
that use them).

I have certainly heard of bunches of success stories in this kind of
environment -- small numbers of relatively lightly-loaded workstations
(typically <= 16) running small to mid-sized MPI applications, etc.  A
common case for such scenarios is for development and debugging, or even
running small versions of jobs when you can't get time on larger resources,
etc.  Specifically: sometimes running a smaller version of your job is
better than not running anything at all.

Hope that helps.


On 7/20/06 10:04 AM, "Brock Palen"  wrote:

> Its doable, the scaling will not as good, because a network is a
> network.  If you are using just regular 100Mbit,  you will not scale
> as far as really good 1gig ethernet, but we are still talking about
> tcp which incurs a penalty over networks like infiniband and myrinet.
> Tcp is the largest issue, its going to be really application
> dependent you are right.
> On another note though many of the older cluster that are now out of
> service used just 100Mbit ethernet and worked.
> 
> Brock Palen
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> On Jul 20, 2006, at 9:27 AM, Vladimir Sipos wrote:
> 
>> Hi,
>> 
>> Is MPI paradigm applicable to the cluster of regular networked
>> machines.
>> That is, does the cost of network IO offset benefits of
>> parallelization?
>> My guess is that this really depends on the application itself,
>> however,
>> I'm wondering if you guys know of any success stories which involve
>> MPI
>> running on a set of networked machines (not beowulf cluster or any
>> SC).
>> 
>> Thanks,
>> 
>> Vladimir Sipos
>> Software Engineer
>> Advertising Technology
>> CNET Networks, Inc.
>> 
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


Re: [OMPI users] SEGV in libopal during MPI_Alltoall

2006-07-20 Thread Frank Gruellich
Hi George,

George Bosilca wrote:
> It is what I suspected. You can see that the envio array is smaller than
> it should. It was created as an array of doubles with the size t_max, when
> it should have been created as an array of double with the size t_max *
> nprocs.

Ah, yes, I see (and even understand).  Great, thank you very much, it
works now.

Kind regards,
-- 
Frank Gruellich
HPC-Techniker

Tel.:   +49 3722 528 42
Fax:+49 3722 528 15
E-Mail: frank.gruell...@megware.com

MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/


Re: [OMPI users] MPI_Finalize runtime error

2006-07-20 Thread Jeff Squyres
What version of Open MPI are you using?

Can you run your application through a memory-checking debugger such as
Valgrind to see if it gives any more information about where the original
problem occurs?


On 7/17/06 10:14 PM, "Manal Helal"  wrote:

> Hi
> 
> after I finish execution, and all results are reported, and both
> processes are about to call MPI_Finalize, I get this runtime error:
> 
> any help is appreciated, thanks
> 
> Manal
> 
> 
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:0xa
> [0] func:/usr/local/bin/openmpi/lib/libopal.so.0 [0x3e526c]
> [1] func:[0x4bfc7440]
> [2] func:/usr/local/bin/openmpi/lib/libopal.so.0(free+0xb4) [0x3e9ff4]
> [3] func:/usr/local/bin/openmpi/lib/libmpi.so.0 [0x70484e]
> [4]
> func:/usr/local/bin/openmpi//lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_
> close+0x278) [0xc78a58]
> [5]
> func:/usr/local/bin/openmpi/lib/libopal.so.0(mca_base_components_close
> +0x6a) [0x3d93fa]
> [6] func:/usr/local/bin/openmpi/lib/libmpi.so.0(mca_btl_base_close+0xbd)
> [0x75154d]
> [7] func:/usr/local/bin/openmpi/lib/libmpi.so.0(mca_bml_base_close+0x17)
> [0x751427]
> [8]
> func:/usr/local/bin/openmpi//lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_component_
> close+0x3a) [0x625a0a]
> [9]
> func:/usr/local/bin/openmpi/lib/libopal.so.0(mca_base_components_close
> +0x6a) [0x3d93fa]
> [10] func:/usr/local/bin/openmpi/lib/libmpi.so.0(mca_pml_base_close
> +0x65) [0x7580e5]
> [11] func:/usr/local/bin/openmpi/lib/libmpi.so.0(ompi_mpi_finalize
> +0x1b4) [0x71e984]
> [12] func:/usr/local/bin/openmpi/lib/libmpi.so.0(MPI_Finalize+0x4b)
> [0x73cb5b]
> [13] func:master/mmMaster(main+0x3cc) [0x804b2dc]
> [14] func:/lib/libc.so.6(__libc_start_main+0xdc) [0x4bffa724]
> [15] func:master/mmMaster [0x8049b91]
> *** End of error message ***
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


Re: [OMPI users] BTL devices

2006-07-20 Thread Jeff Squyres
On 7/14/06 10:40 AM, "Michael Kluskens"  wrote:

> I've looked through the documentation but I haven't found the
> discussion about what each BTL device is, for example, I have:
> 
> MCA btl: self (MCA v1.0, API v1.0, Component v1.2)

This is the "loopback" Open MPI device.  It is used exclusively for sending
and receiving from one process to the same process.  I.e., message passing
is effected by memcpy's in the same process -- no network is involved (not
even shared memory, because it's within a single process).

We do this not for optimization, but rather for software engineering reasons
-- by having a "self" BTL, all the other BTLs can assume that they never
have to handle the special case of "sending/receiving to self".

> MCA btl: sm (MCA v1.0, API v1.0, Component v1.2)

This is shared memory.  It is used to communicate between processes on the
same node.

> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)

I think this one is pretty obvious.  ;-)

> I found a PDF presentation that describes a few:
> 
> € tcp - TCP/IP
> € openib ­ Infiniband OpenIB Stack
> € gm/mx- Myrinet GM/MX
> € mvapi - Infiniband Mellanox Verbs
> € sm - Shared Memory
> 
> Are there any others I may see when interacting with other people's
> computers?

These are the main ones for now.  There may be more in the future.

> I assume that if a machine has Myrinet and I don't see MCA btl: gm or
> MCA btl: mx then I have to explain the problem to the sysadm's.

Correct.

> The second question is should I see both gm & mx, or only one or the
> other.

Probably just one or the other; I *believe* that you cannot have both
installed on the same node.  That being said, you can have the *support
libraries* for both installed on the same node and therefore Open MPI can
build support for it and show that those btl's exist in the output of
ompi_info.  But only one will *run* at a time.

Sorry for the delay on the answer -- hope this helps!

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Re: [OMPI users] What Really Happens During OpenMPI MPI_INIT?

2006-07-20 Thread Jeff Squyres
On 7/17/06 12:37 AM, "Mahesh Barve"  wrote:

>   Can anyone please enlighten us about what really
> happens in MPI_init() in openMPI?

This is quite a complicated question.  :-)

>   More specifically i am interested in knowing
> 1.Functions that needs to accomplished during
> MPI_init()
> 2.What has already been implemented in openMPI
> MPI_Init
> 2. The routines called/invoked that perform these
> functions

Many, many things happen in MPI_INIT.  Here's a sample:

- setup the lowest layer of the system (OPAL)
- setup the run-time environment (ORTE)
  - find out our rank in MPI_COMM_WORLD
  - find out how many peers we have and who they are
  - find out how to contact our peers
- setup the progression engine
- setup processor affinity (if desired)
- setup all the various component frameworks to implement much of the MPI
functionality
  - setup our MPI point-to-point channels
- publish information on how peer processes can contact me
- receive information on how to contact peer processes
  - setup MPI collectives
  - setup MPI topologies
  - ...etc.
- setup all the MPI handle processing (MPI_Comm, MPI_Datatype, etc.)
  - initialize pre-defined handles
  - create fortran translation tables
- ...etc.

I would suggest that you look through ompi/runtime/ompi_mpi_init.c.  It's
basically a big dispatch function of all the events that occur during
MPI_INIT (i.e., both MPI_INIT and MPI_INIT_THREAD -- ompi/mpi/c/init.c and
ompi/mpi/c/init_thread.c, respectively -- call this function to do all the
work).  

The list of things that it does is quite explicit.  Note that the ordering
of functions in this function is extremely important -- almost all the
functions are strictly ordered because of explicit or implicit dependencies.

Does that help?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems