Re: [OMPI users] Summary of OMPI on OS X with Intel
On 7/18/06 7:33 PM, "Warner Yuen" wrote: > USING GCC 4.0.1 (build 5341) with and without Intel Fortran (build > 9.1.027): What version of Open MPI were you working with? If it was a developer/SVN checkout, what version of the GNU Auto tools were you using? > Config #2: ./configure --disable-shared --enable-static --with-rsh=/ > usr/bin/ssh > Successful Build = NO > Error: > g++ -O3 -DNDEBUG -finline-functions -Wl,-u -Wl,_munmap -Wl,- > multiply_defined -Wl,suppress -o ompi_info components.o ompi_info.o > output.o param.o version.o -Wl,-bind_at_load ../../../ompi/.libs/ > libmpi.a /Users/wyuen/mpi_src/openmpi-1.1/orte/.libs/liborte.a /Users/ > wyuen/mpi_src/openmpi-1.1/opal/.libs/libopal.a -ldl > /usr/bin/ld: Undefined symbols: > _mpi_fortran_status_ignore_ > _mpi_fortran_statuses_ignore_ Do you have a fortran compiler at all? If so, which one? Please send the full output from configure, config.log, and the output from make (stdout and stderr). > USING Intel C (build 9.1.027) and with and without Intel Fortran > (build 9.1.027) > > Config #4: ./configure --disable-mpi-f77 --disable-mpi-f90 --with- > rsh=/usr/bin/ssh > Successful Build = NO > Error: > IPO link: can not find "1" > icc: error: problem during multi-file optimization compilation (code 1) > make[2]: *** [libopal.la] Error 1 > make[1]: *** [all-recursive] Error 1 > make: *** [all-recursive] Error 1 This *looks* like a libtool problem. Can you send the full configure output, config.log, and full output from "make"? > Config #6: ./configure --disable-shared --enable-static --with-rsh=/ > usr/bin/ssh > Successful Build = NO > Error: > _mpi_fortran_statuses_ignore_ I suspect that this is the same problem as #2. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] SEGV in libopal during MPI_Alltoall
I have the same error message:"forrtl: severe (174): SIGSEGV, segmentation fault occurred". I run my paralled code on single node or multi nodes, the error existes. Then i try three Intel compilers : 8.1.037, 9.0.032 and 9.1.033 , but the error still existes. But my code work correctly on Windows XP with Visual Fortran 6.6. I doubt whether the intel compiler has some bugs. I also try to solve the problem, maybe the bug is in my code. Do you have the other compiler? Could you check the error and report it ? T.T. Shen Frank Gruellich 說: Hi, I'm running OFED 1.0 with OpenMPI 1.1b1-1 compiled for Intel Compiler 9.1. I get this error message during an MPI_Alltoall call: Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0x1cd04fe0 [0] func:/usr/ofed/mpi/intel/openmpi-1.1b1-1/lib64/libopal.so.0 [0x2b56964acc75] [1] func:/lib64/libpthread.so.0 [0x2b569739b140] [2] func:/software/intel/fce/9.1.032/lib/libirc.so(__intel_new_memcpy+0x1540) [0x2b5697278cf0] *** End of error message *** and have no idea about the problem. It arises if I exceed a specific number (10) of MPI nodes. The error occures in this code: do i = 1,npuntos print *,'puntos',i tam = 2**(i-1) tmin = 1e5 tavg = 0.0d0 do j = 1,rep envio = 8.0d0*j call mpi_barrier(mpi_comm_world,ierr) time1 = mpi_wtime() do k = 1,rep2 call mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr) end do call mpi_barrier(mpi_comm_world,ierr) time2 = mpi_wtime() time = (time2 - time1)/(rep2) if (time < tmin) tmin = time tavg = tavg + time end do m_tmin(i) = tmin m_tavg(i) = tavg/rep end do this code is said to be running on another system (running IBGD 1.8.x). I already tested mpich_mlx_intel-0.9.7_mlx2.1.0-1, but get a similar error message when using 13 nodes: forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source libpthread.so.0 2B65DA39B140 Unknown Unknown Unknown main.out 00448BDB Unknown Unknown Unknown [9] Registration failed, file : intra_rdma_alltoall.c, line : 163 [6] Registration failed, file : intra_rdma_alltoall.c, line : 163 9 - MPI_ALLTOALL : Unknown error [9] [] Aborting Program! 6 - MPI_ALLTOALL : Unknown error [6] [] Aborting Program! [2] Registration failed, file : intra_rdma_alltoall.c, line : 163 [11] Registration failed, file : intra_rdma_alltoall.c, line : 163 11 - MPI_ALLTOALL : Unknown error [11] [] Aborting Program! 2 - MPI_ALLTOALL : Unknown error [2] [] Aborting Program! [10] Registration failed, file : intra_rdma_alltoall.c, line : 163 10 - MPI_ALLTOALL : Unknown error [10] [] Aborting Program! [5] Registration failed, file : intra_rdma_alltoall.c, line : 163 5 - MPI_ALLTOALL : Unknown error [5] [] Aborting Program! [3] Registration failed, file : intra_rdma_alltoall.c, line : 163 [8] Registration failed, file : intra_rdma_alltoall.c, line : 163 3 - MPI_ALLTOALL : Unknown error [3] [] Aborting Program! 8 - MPI_ALLTOALL : Unknown error [8] [] Aborting Program! [4] Registration failed, file : intra_rdma_alltoall.c, line : 163 4 - MPI_ALLTOALL : Unknown error [4] [] Aborting Program! [7] Registration failed, file : intra_rdma_alltoall.c, line : 163 7 - MPI_ALLTOALL : Unknown error [7] [] Aborting Program! [0] Registration failed, file : intra_rdma_alltoall.c, line : 163 0 - MPI_ALLTOALL : Unknown error [0] [] Aborting Program! [1] Registration failed, file : intra_rdma_alltoall.c, line : 163 1 - MPI_ALLTOALL : Unknown error [1] [] Aborting Program! I don't know wether this is a problem with MPI or Intel Compiler. Please, can anybody point me in the right direction what I could have done wrong? This is my first post (so be gentle) and at this time I'm not very used to the verbosity of this list, so if you need any further informations do not hesitate do request them. Thanks in advance and kind regards, -- Frank Gruellich HPC-Techniker Tel.: +49 3722 528 42 Fax: +49 3722 528 15 E-Mail: frank.gruell...@megware.com MEGWARE Computer GmbH Vertrieb und Service Nordstrasse 19 09247 Chemnitz/Roehrsdorf Germany http://www.megware.com/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ 最新版 Yahoo!奇摩即時通訊 7.0,免費網路電話任你打! http://messenger.yahoo.com.tw/
[OMPI users] OPEN_MPI with Intel Compiler -regards
Dear All, I was able to compile OpenMPI and create wrapper functions(like mpicc,mpif77,etc) on top of GNU compilers.But when i tried it with Intel fortran compiler(Since i need f90 compiler also),i met with some configuration error(Hence i did'nt ger the Makefile).I am here with attching the error file which i am getting while configuring the source code.What could be the problem.Thanks in advance. Regards Saifudheen esa...@hcl.in openmpi_intel_error Description: Binary data
[OMPI users] OpenMPI v/s( MPICH,LAM/MPI)
Dear All, I have been using openMPI for the last one month,so i need some clarification regrding the following points. 1). What is the advantage of OpenMPI over MPICH2 and LAM/MPI.I mean to say is there any difference in performace wise. 2). Any check pointing mechanism is there in OpenMPI like it is there in LAM/MPI. 3) .Can i port the openMPI in any of the platform(X86,X86-64,ia64). Regards Saifu
Re: [OMPI users] SEGV in libopal during MPI_Alltoall
Hi, Graham E Fagg wrote: > I am not sure which alltoall your using in 1.1 so can you please run > the ompi_info utility which is normally built and put into the same > directory as mpirun? > > i.e. host% ompi_info > > This provides lots of really usefull info on everything before we dig > deeper into your issue > > > and then more specifically run > host% ompi_info --param coll all Find attached ~/notes from $ ( ompi_info; echo '='; ompi_info --param coll all ) >~/notes Thanks in advance and kind regards, -- Frank Gruellich HPC-Techniker Tel.: +49 3722 528 42 Fax:+49 3722 528 15 E-Mail: frank.gruell...@megware.com MEGWARE Computer GmbH Vertrieb und Service Nordstrasse 19 09247 Chemnitz/Roehrsdorf Germany http://www.megware.com/ Open MPI: 1.1b1 Open MPI SVN revision: r10217 Open RTE: 1.1b1 Open RTE SVN revision: r10217 OPAL: 1.1b1 OPAL SVN revision: r10217 Prefix: /usr/ofed/mpi/intel/openmpi-1.1b1-1 Configured architecture: x86_64-suse-linux-gnu Configured by: root Configured on: Wed Jul 19 20:51:46 CEST 2006 Configure host: frontend Built by: root Built on: Wed Jul 19 21:04:47 CEST 2006 Built host: frontend C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: icc C compiler absolute: /software/intel/cce/9.1.038/bin/icc C++ compiler: icpc C++ compiler absolute: /software/intel/cce/9.1.038/bin/icpc Fortran77 compiler: ifort Fortran77 compiler abs: /software/intel/fce/9.1.032/bin/ifort Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1) MCA timer: linux (MCA v1.0, API v1.0, Component v1.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.1) MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.1) MCA coll: sm (MCA v1.0, API v1.0, Component v1.1) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.1) MCA mpool: openib (MCA v1.0, API v1.0, Component v1.1) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1) MCA pml: dr (MCA v1.0, API v1.0, Component v1.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1) MCA btl: openib (MCA v1.0, API v1.0, Component v1.1) MCA btl: self (MCA v1.0, API v1.0, Component v1.1) MCA btl: sm (MCA v1.0, API v1.0, Component v1.1) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.1) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.1) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1) MCA iof: svc (MCA v1.0, API v1.0, Component v1.1) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1) MCA ns: replica (MCA v1.0, API v1.0, Component v1.1) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1) MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1) MCA rmgr: p
Re: [OMPI users] SEGV in libopal during MPI_Alltoall
Hi, shen T.T. wrote: > Do you have the other compiler? Could you check the error and report it ? I don't used other Intel Compilers at the moment, but I'm going to give gfortran a try today. Kind regards, -- Frank Gruellich HPC-Techniker Tel.: +49 3722 528 42 Fax:+49 3722 528 15 E-Mail: frank.gruell...@megware.com MEGWARE Computer GmbH Vertrieb und Service Nordstrasse 19 09247 Chemnitz/Roehrsdorf Germany http://www.megware.com/
Re: [OMPI users] SEGV in libopal during MPI_Alltoall
Hi, George Bosilca wrote: > On the all-to-all collective the send and receive buffers has to be able > to contain all the information you try to send. On this particular case, > as you initialize the envio variable to a double I suppose it is defined > as a double. If it's the case then the error is that the send operation > will send more data than the amount available on the envio variable. > > If you want to be able to do correctly the all-to-all in your example, > make sure the envio variable has a size at least equal to: > tam * sizeof(byte) * NPROCS, where NPROCS is the number of procs available > on the mpi_comm_world communicator. I'm unfortunately not that Fortran guy. Maybe the best would be to submit the whole function at the beginning, it's neither secret nor big: module alltoall use globales implicit none contains subroutine All_to_all integer,parameter :: npuntos = 24 integer,parameter :: t_max = 2**(npuntos-1) integer siguiente,anterior,tam,rep,p_1,p_2,i,j,ndatos,rep2,o,k double precision time1,time2,time,ov,tmin,tavg double precision,dimension(t_max)::envio double precision,dimension(:),allocatable::recibe double precision,dimension(npuntos)::m_tmin,m_tavg double precision,dimension(npuntos)::tams rep2 = 10 tag1 = 1 tag2 = 2 rep = 3 allocate(recibe(t_max*nproc)) siguiente = my_id + 1 if (my_id == nproc -1) siguiente = 0 anterior = my_id - 1 if (my_id == 0) anterior = nproc- 1 do i = 1,npuntos print *,'puntos',i tam = 2**(i-1) tmin = 1e5 tavg = 0.0d0 do j = 1,rep envio = 8.0d0*j call mpi_barrier(mpi_comm_world,ierr) time1 = mpi_wtime() do k = 1,rep2 call mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr) end do call mpi_barrier(mpi_comm_world,ierr) time2 = mpi_wtime() time = (time2 - time1)/(rep2) if (time < tmin) tmin = time tavg = tavg + time end do m_tmin(i) = tmin m_tavg(i) = tavg/rep end do call mpi_barrier(mpi_comm_world,ierr) print *,"acaba" if (my_id == 0) then open (1,file='Alltoall.dat') write (1,*) "#Prueba All to all entre todos los procesadores(",nproc,")" write (1,*) "#Precision del reloj:",mpi_wtick()*1.0d6,"(muS)" do i =1,npuntos write(1,900) 2*nproc*2**(i-1),m_tmin(i),m_tavg(i)!,ov end do close(1) end if 900 FORMAT(I10,F14.8,F14.8) 800 FORMAT(I10,F14.8,F14.8) end subroutine end module Can you read this? (Sorry, I can't.) But the size_of envio seems to be 2**32 = 8388608 doubles, isn't it? I don't understand, why it should depend on the number of MPI nodes, as you said. Thanks for your help. Kind regards, -- Frank Gruellich HPC-Techniker Tel.: +49 3722 528 42 Fax:+49 3722 528 15 E-Mail: frank.gruell...@megware.com MEGWARE Computer GmbH Vertrieb und Service Nordstrasse 19 09247 Chemnitz/Roehrsdorf Germany http://www.megware.com/
Re: [OMPI users] SEGV in libopal during MPI_Alltoall
It is what I suspected. You can see that the envio array is smaller than it should. It was created as an array of doubles with the size t_max, when it should have been created as an array of double with the size t_max * nprocs. If you look how the recibe array is created you can notice that it's size if t_max * nprocs (allocate(recibe(t_max*nproc))). As on the all-to-all operation everybody send and receive exactly the same amount of data, both the send and receive array should have the same size. I propose the following fix: - instead of double precision,dimension(t_max)::envio double precision,dimension(:),allocatable::recibe do a double precision,dimension(:),allocatable::envio double precision,dimension(:),allocatable::recibe - then, when the recibe array is created add the allocation for envio too allocate(recibe(t_max*nproc)) allocate(envio(t_max*nproc)) Now your program should work just fine. george. On Thu, 20 Jul 2006, Frank Gruellich wrote: Hi, George Bosilca wrote: On the all-to-all collective the send and receive buffers has to be able to contain all the information you try to send. On this particular case, as you initialize the envio variable to a double I suppose it is defined as a double. If it's the case then the error is that the send operation will send more data than the amount available on the envio variable. If you want to be able to do correctly the all-to-all in your example, make sure the envio variable has a size at least equal to: tam * sizeof(byte) * NPROCS, where NPROCS is the number of procs available on the mpi_comm_world communicator. I'm unfortunately not that Fortran guy. Maybe the best would be to submit the whole function at the beginning, it's neither secret nor big: module alltoall use globales implicit none contains subroutine All_to_all integer,parameter :: npuntos = 24 integer,parameter :: t_max = 2**(npuntos-1) integer siguiente,anterior,tam,rep,p_1,p_2,i,j,ndatos,rep2,o,k double precision time1,time2,time,ov,tmin,tavg double precision,dimension(t_max)::envio double precision,dimension(:),allocatable::recibe double precision,dimension(npuntos)::m_tmin,m_tavg double precision,dimension(npuntos)::tams rep2 = 10 tag1 = 1 tag2 = 2 rep = 3 allocate(recibe(t_max*nproc)) siguiente = my_id + 1 if (my_id == nproc -1) siguiente = 0 anterior = my_id - 1 if (my_id == 0) anterior = nproc- 1 do i = 1,npuntos print *,'puntos',i tam = 2**(i-1) tmin = 1e5 tavg = 0.0d0 do j = 1,rep envio = 8.0d0*j call mpi_barrier(mpi_comm_world,ierr) time1 = mpi_wtime() do k = 1,rep2 call mpi_alltoall(envio,tam,mpi_byte,recibe,tam,mpi_byte,mpi_comm_world,ierr) end do call mpi_barrier(mpi_comm_world,ierr) time2 = mpi_wtime() time = (time2 - time1)/(rep2) if (time < tmin) tmin = time tavg = tavg + time end do m_tmin(i) = tmin m_tavg(i) = tavg/rep end do call mpi_barrier(mpi_comm_world,ierr) print *,"acaba" if (my_id == 0) then open (1,file='Alltoall.dat') write (1,*) "#Prueba All to all entre todos los procesadores(",nproc,")" write (1,*) "#Precision del reloj:",mpi_wtick()*1.0d6,"(muS)" do i =1,npuntos write(1,900) 2*nproc*2**(i-1),m_tmin(i),m_tavg(i)!,ov end do close(1) end if 900 FORMAT(I10,F14.8,F14.8) 800 FORMAT(I10,F14.8,F14.8) end subroutine end module Can you read this? (Sorry, I can't.) But the size_of envio seems to be 2**32 = 8388608 doubles, isn't it? I don't understand, why it should depend on the number of MPI nodes, as you said. Thanks for your help. Kind regards, "We must accept finite disappointment, but we must never lose infinite hope." Martin Luther King
Re: [OMPI users] Summary of OMPI on OS X with Intel
On 7/20/06 12:04 AM, "Jeff Squyres" wrote: >> Config #2: ./configure --disable-shared --enable-static --with-rsh=/ >> usr/bin/ssh >> Successful Build = NO >> Error: >> g++ -O3 -DNDEBUG -finline-functions -Wl,-u -Wl,_munmap -Wl,- >> multiply_defined -Wl,suppress -o ompi_info components.o ompi_info.o >> output.o param.o version.o -Wl,-bind_at_load ../../../ompi/.libs/ >> libmpi.a /Users/wyuen/mpi_src/openmpi-1.1/orte/.libs/liborte.a /Users/ >> wyuen/mpi_src/openmpi-1.1/opal/.libs/libopal.a -ldl >> /usr/bin/ld: Undefined symbols: >> _mpi_fortran_status_ignore_ >> _mpi_fortran_statuses_ignore_ > > Do you have a fortran compiler at all? If so, which one? Please send the > full output from configure, config.log, and the output from make (stdout and > stderr). I was able to replicate this one (which, even though I don't have the Intel compilers for OSX/intel, I'm pretty sure is the same issue as #6). I believe that this error will occur regardless of whether you include F77 support or not. I'm pretty sure that I have a fix for this; I think it's an OSX-linker-specific problem. It'll probably hit the trunk and the v1.1 branch later today. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] OPEN_MPI with Intel Compiler -regards
Could you re-send that? The attachment that I got was an excel spreadsheet with the output from configure that did not show any errors -- it just stopped in the middle of the check for "bool" in the C++ compiler. Two notes: 1. One common mistake that people make is to use the "icc" compiler for the C++ compiler. Recent versions of the Intel compiler renamed the C++ compiler to be "icpc". If your version of the Intel compiler has an "icpc", you need to use that for the C++ compiler. 2. We had some problems with the Intel 8.1 compiler at one point -- it would seg fault while compiling legal C code. I think that later builds of the Intel 8.1 compiler fixed the problem, however. You might want to check that you have the latest build of the 8.1 compiler. If these two suggestions don't help, please see the "Getting help" web page to see what information we need to help with compile problems. Thanks! http://www.open-mpi.org/community/help/ On 7/20/06 2:00 AM, "esaifu" wrote: > Dear All, > I was able to compile OpenMPI and create wrapper functions(like > mpicc,mpif77,etc) on top of GNU compilers.But when i tried it with Intel > fortran compiler(Since i need f90 compiler also),i met with some configuration > error(Hence i did'nt ger the Makefile).I am here with attching the error file > which i am getting while configuring the source code.What could be the > problem.Thanks in advance. > Regards > Saifudheen > esa...@hcl.in > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] OpenMPI v/s( MPICH,LAM/MPI)
On 7/20/06 2:06 AM, "esaifu" wrote: > I have been using openMPI for the last one month,so i need some clarification > regrding the following points. > 1). What is the advantage of OpenMPI over MPICH2 and LAM/MPI.I mean to say > is there any difference in performace wise. Open MPI's TCP performance is still a bit sub-par (because until only recently, none of us had gotten around to optimizing it). It is probably below MPICH's TCP performance and definitely below LAM's TCP performance. There are still a few features that have not yet been ported to OMPI from LAM (we're working on them -- TCP performance is one of them). But even with those missing features, I consider OMPI to be a superior product than LAM/MPI -- I have switched all my day-to-day MPI applications to use Open MPI (instead of LAM). Indeed, since Open MPI was designed and built by -- among many others -- the LAM/MPI crew, it contains most of the great ideas from LAM and is therefore (in my mind, at least ;-) ) a worthy successor. I can say all these things about LAM because I was the technical lead for it for many years, and therefore have pretty good insight in the comparison of the two. One of the main advantages of Open MPI is that it has different goals than other MPI implementations. Open MPI aims to be a production-quality software, is fully open source, strikes a good balance of cutting-edge research and stability, and actively invites others to join in the process. While there will always be bugs and "but with specific metric ABC, performs better than Open MPI!", we feel that the above are critical characteristics that distinguish Open MPI from other projects. > 2). Any check pointing mechanism is there in OpenMPI like it is there in > LAM/MPI. Not yet. Work is actively progressing on this front. Search the mailing list archives for mails from Josh Hursey for more details on this. The short version is that we will have a demonstratable version of checkpoint/restart at SC'06 (although it is highly unlikely that it will be included in a stable release by then). The checkpoint/restart work that we are doing in OMPI will far surpass what we did in LAM/MPI. > 3) .Can i port the openMPI in any of the platform(X86,X86-64,ia64). Yes. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
[OMPI users] MPI applicability
Hi, Is MPI paradigm applicable to the cluster of regular networked machines. That is, does the cost of network IO offset benefits of parallelization? My guess is that this really depends on the application itself, however, I'm wondering if you guys know of any success stories which involve MPI running on a set of networked machines (not beowulf cluster or any SC). Thanks, Vladimir Sipos Software Engineer Advertising Technology CNET Networks, Inc.
Re: [OMPI users] MPI applicability
Its doable, the scaling will not as good, because a network is a network. If you are using just regular 100Mbit, you will not scale as far as really good 1gig ethernet, but we are still talking about tcp which incurs a penalty over networks like infiniband and myrinet. Tcp is the largest issue, its going to be really application dependent you are right. On another note though many of the older cluster that are now out of service used just 100Mbit ethernet and worked. Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jul 20, 2006, at 9:27 AM, Vladimir Sipos wrote: Hi, Is MPI paradigm applicable to the cluster of regular networked machines. That is, does the cost of network IO offset benefits of parallelization? My guess is that this really depends on the application itself, however, I'm wondering if you guys know of any success stories which involve MPI running on a set of networked machines (not beowulf cluster or any SC). Thanks, Vladimir Sipos Software Engineer Advertising Technology CNET Networks, Inc. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI applicability
I think there are two questions here: 1. Running MPI applications on "slow" networks (e.g., 100mbps). This is very much application-dependent. If your MPI app doesn't communication with other processes much, then it probably won't matter. If you have latency/bandwidth-sensitive applications, then using a "slow" network can definitely have a negative impact on performance. 2. Running MPI applications on resources that are being used by others. In this case, your MPI processes will be competing with other processes for CPU, RAM, and other resources -- just like any other process. Hence, your overall performance will depend not only on the application, but also on the usage patterns of the other resources (e.g., the workstations and the people that use them). I have certainly heard of bunches of success stories in this kind of environment -- small numbers of relatively lightly-loaded workstations (typically <= 16) running small to mid-sized MPI applications, etc. A common case for such scenarios is for development and debugging, or even running small versions of jobs when you can't get time on larger resources, etc. Specifically: sometimes running a smaller version of your job is better than not running anything at all. Hope that helps. On 7/20/06 10:04 AM, "Brock Palen" wrote: > Its doable, the scaling will not as good, because a network is a > network. If you are using just regular 100Mbit, you will not scale > as far as really good 1gig ethernet, but we are still talking about > tcp which incurs a penalty over networks like infiniband and myrinet. > Tcp is the largest issue, its going to be really application > dependent you are right. > On another note though many of the older cluster that are now out of > service used just 100Mbit ethernet and worked. > > Brock Palen > Center for Advanced Computing > bro...@umich.edu > (734)936-1985 > > > On Jul 20, 2006, at 9:27 AM, Vladimir Sipos wrote: > >> Hi, >> >> Is MPI paradigm applicable to the cluster of regular networked >> machines. >> That is, does the cost of network IO offset benefits of >> parallelization? >> My guess is that this really depends on the application itself, >> however, >> I'm wondering if you guys know of any success stories which involve >> MPI >> running on a set of networked machines (not beowulf cluster or any >> SC). >> >> Thanks, >> >> Vladimir Sipos >> Software Engineer >> Advertising Technology >> CNET Networks, Inc. >> >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] SEGV in libopal during MPI_Alltoall
Hi George, George Bosilca wrote: > It is what I suspected. You can see that the envio array is smaller than > it should. It was created as an array of doubles with the size t_max, when > it should have been created as an array of double with the size t_max * > nprocs. Ah, yes, I see (and even understand). Great, thank you very much, it works now. Kind regards, -- Frank Gruellich HPC-Techniker Tel.: +49 3722 528 42 Fax:+49 3722 528 15 E-Mail: frank.gruell...@megware.com MEGWARE Computer GmbH Vertrieb und Service Nordstrasse 19 09247 Chemnitz/Roehrsdorf Germany http://www.megware.com/
Re: [OMPI users] MPI_Finalize runtime error
What version of Open MPI are you using? Can you run your application through a memory-checking debugger such as Valgrind to see if it gives any more information about where the original problem occurs? On 7/17/06 10:14 PM, "Manal Helal" wrote: > Hi > > after I finish execution, and all results are reported, and both > processes are about to call MPI_Finalize, I get this runtime error: > > any help is appreciated, thanks > > Manal > > > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) > Failing at addr:0xa > [0] func:/usr/local/bin/openmpi/lib/libopal.so.0 [0x3e526c] > [1] func:[0x4bfc7440] > [2] func:/usr/local/bin/openmpi/lib/libopal.so.0(free+0xb4) [0x3e9ff4] > [3] func:/usr/local/bin/openmpi/lib/libmpi.so.0 [0x70484e] > [4] > func:/usr/local/bin/openmpi//lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_ > close+0x278) [0xc78a58] > [5] > func:/usr/local/bin/openmpi/lib/libopal.so.0(mca_base_components_close > +0x6a) [0x3d93fa] > [6] func:/usr/local/bin/openmpi/lib/libmpi.so.0(mca_btl_base_close+0xbd) > [0x75154d] > [7] func:/usr/local/bin/openmpi/lib/libmpi.so.0(mca_bml_base_close+0x17) > [0x751427] > [8] > func:/usr/local/bin/openmpi//lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_component_ > close+0x3a) [0x625a0a] > [9] > func:/usr/local/bin/openmpi/lib/libopal.so.0(mca_base_components_close > +0x6a) [0x3d93fa] > [10] func:/usr/local/bin/openmpi/lib/libmpi.so.0(mca_pml_base_close > +0x65) [0x7580e5] > [11] func:/usr/local/bin/openmpi/lib/libmpi.so.0(ompi_mpi_finalize > +0x1b4) [0x71e984] > [12] func:/usr/local/bin/openmpi/lib/libmpi.so.0(MPI_Finalize+0x4b) > [0x73cb5b] > [13] func:master/mmMaster(main+0x3cc) [0x804b2dc] > [14] func:/lib/libc.so.6(__libc_start_main+0xdc) [0x4bffa724] > [15] func:master/mmMaster [0x8049b91] > *** End of error message *** > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] BTL devices
On 7/14/06 10:40 AM, "Michael Kluskens" wrote: > I've looked through the documentation but I haven't found the > discussion about what each BTL device is, for example, I have: > > MCA btl: self (MCA v1.0, API v1.0, Component v1.2) This is the "loopback" Open MPI device. It is used exclusively for sending and receiving from one process to the same process. I.e., message passing is effected by memcpy's in the same process -- no network is involved (not even shared memory, because it's within a single process). We do this not for optimization, but rather for software engineering reasons -- by having a "self" BTL, all the other BTLs can assume that they never have to handle the special case of "sending/receiving to self". > MCA btl: sm (MCA v1.0, API v1.0, Component v1.2) This is shared memory. It is used to communicate between processes on the same node. > MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) I think this one is pretty obvious. ;-) > I found a PDF presentation that describes a few: > > tcp - TCP/IP > openib Infiniband OpenIB Stack > gm/mx- Myrinet GM/MX > mvapi - Infiniband Mellanox Verbs > sm - Shared Memory > > Are there any others I may see when interacting with other people's > computers? These are the main ones for now. There may be more in the future. > I assume that if a machine has Myrinet and I don't see MCA btl: gm or > MCA btl: mx then I have to explain the problem to the sysadm's. Correct. > The second question is should I see both gm & mx, or only one or the > other. Probably just one or the other; I *believe* that you cannot have both installed on the same node. That being said, you can have the *support libraries* for both installed on the same node and therefore Open MPI can build support for it and show that those btl's exist in the output of ompi_info. But only one will *run* at a time. Sorry for the delay on the answer -- hope this helps! -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] What Really Happens During OpenMPI MPI_INIT?
On 7/17/06 12:37 AM, "Mahesh Barve" wrote: > Can anyone please enlighten us about what really > happens in MPI_init() in openMPI? This is quite a complicated question. :-) > More specifically i am interested in knowing > 1.Functions that needs to accomplished during > MPI_init() > 2.What has already been implemented in openMPI > MPI_Init > 2. The routines called/invoked that perform these > functions Many, many things happen in MPI_INIT. Here's a sample: - setup the lowest layer of the system (OPAL) - setup the run-time environment (ORTE) - find out our rank in MPI_COMM_WORLD - find out how many peers we have and who they are - find out how to contact our peers - setup the progression engine - setup processor affinity (if desired) - setup all the various component frameworks to implement much of the MPI functionality - setup our MPI point-to-point channels - publish information on how peer processes can contact me - receive information on how to contact peer processes - setup MPI collectives - setup MPI topologies - ...etc. - setup all the MPI handle processing (MPI_Comm, MPI_Datatype, etc.) - initialize pre-defined handles - create fortran translation tables - ...etc. I would suggest that you look through ompi/runtime/ompi_mpi_init.c. It's basically a big dispatch function of all the events that occur during MPI_INIT (i.e., both MPI_INIT and MPI_INIT_THREAD -- ompi/mpi/c/init.c and ompi/mpi/c/init_thread.c, respectively -- call this function to do all the work). The list of things that it does is quite explicit. Note that the ordering of functions in this function is extremely important -- almost all the functions are strictly ordered because of explicit or implicit dependencies. Does that help? -- Jeff Squyres Server Virtualization Business Unit Cisco Systems