Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION
Hi Jeff, > You didn't answer my prior questions. :-) I am observing this crash using MPI_ALLREDUCE() in test program; and which does not have any memory corruption issue. ;) > I ran your test program with -np 2 and -np 4 and it seemed to work ok. Can you please let me know what environment (including os, compilers) are you using? I am able to reproduce the crash using attached simplified test program with 5 element array. Please note that these experiments I am doing on Windows7 using msys/mingw console; see attached makefile for more information. When running this program as "C:\>mpirun mar_f_dp2.exe" it works fine; but when running it as "C:\>mpirun -np 2 mar_f_dp2.exe" it generates following error on console... C:\>mpirun -np 2 mar_f_dp2.exe 0 0 0 size= 2 , rank= 0 start -- 0 0 0 size= 2 , rank= 1 start -- forrtl: severe (157): Program Exception - access violation Image PCRoutineLineSource [vibgyor:09168] [[28311,0],0]-[[28311,1],0] mca_oob_tcp_msg_recv: readv failed: Unknown error (108) -- WARNING: A process refused to die! Host: vibgyor PID: 512 This process may still be running and/or consuming resources. -- -- mpirun has exited due to process rank 0 with PID 476 on node vibgyor exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- Another observation I have with attached test program is that it crashes in MPI_Finilize() on running "C:\>mpirun mar_f_dp2.exe" if we un-comment following lines (lines 27 and 35)... write(*,*) "start --, rcvbuf=", rcvbuf ... write(*,*) "end --, rcvbuf=", rcvbuf Thank you in advance. -Hiral makefile Description: Binary data mar_f_dp2.f Description: Binary data
[OMPI users] openmpi (1.2.8 or above) and Intel composer XE 2011 (aka 12.0)
Dear all, we succeed in building several version of openmpi from 1.2.8 to 1.4.3 with Intel composer XE 2011 (aka 12.0). However we found a threshold in the number of cores (depending from the application: IMB, xhpl or user applications and form the number of required cores) above which the application hangs (sort of deadlocks). The building of openmpi with 'gcc' and 'pgi' does not show the same limits. There are any known incompatibilities of openmpi with this version of intel compiilers? The characteristics of our computational infrastructure are: Intel processors E7330, E5345, E5530 e E5620 CentOS 5.3, CentOS 5.5. Intel composer XE 2011 gcc 4.1.2 pgi 10.2-1 Regards Salvatore Podda ENEA UTICT-HPC Department for Computer Science Development and ICT Facilities Laboratory for Science and High Performace Computing C.R. Frascati Via E. Fermi, 45 PoBox 65 00044 Frascati (Rome) Italy Tel: +39 06 9400 5342 Fax: +39 06 9400 5551 Fax: +39 06 9400 5735 E-mail: salvatore.po...@enea.it Home Page: www.cresco.enea.it
[OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
Hi, I compile a parallel program with OpenMPI 1.4.1 (compiled with intel compilers 12 from composerxe package) . This program is linked to MUMPS library 4.9.2, compiled with the same compilers and link with intel MKL. The OS is linux debian. No error in compiling or running the job, but the program freeze inside a call to "zmumps" routine, when the slaves process call MPI_COMM_DUP routine. The program is executed on 2 nodes of 12 cores each (westmere processors) with the following command : mpirun -np 24 --machinefile $OAR_NODE_FILE -mca plm_rsh_agent "oarsh" --mca btl self,openib -x LD_LIBRARY_PATH ./prog We have 12 process running on each node. We submit the job with OAR batch scheduler (the $OAR_NODE_FILE variable and "oarsh" command are specific to this scheduler and are usually working well with openmpi ) via gdb, on the slaves, we can see that they are blocked in MPI_COMM_DUP : (gdb) where #0 0x2b32c1533113 in poll () from /lib/libc.so.6 #1 0x00adf52c in poll_dispatch () #2 0x00adcea3 in opal_event_loop () #3 0x00ad69f9 in opal_progress () #4 0x00a34b4e in mca_pml_ob1_recv () #5 0x009b0768 in ompi_coll_tuned_allreduce_intra_recursivedoubling () #6 0x009ac829 in ompi_coll_tuned_allreduce_intra_dec_fixed () #7 0x0097e271 in ompi_comm_allreduce_intra () #8 0x0097dd06 in ompi_comm_nextcid () #9 0x0097be01 in ompi_comm_dup () #10 0x009a0785 in PMPI_Comm_dup () #11 0x0097931d in pmpi_comm_dup__ () #12 0x00644251 in zmumps (id=...) at zmumps_part1.F:144 #13 0x004c0d03 in sub_pbdirect_init (id=..., matrix_build=...) at sub_pbdirect_init.f90:44 #14 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 the master wait further : (gdb) where #0 0x2b9dc9f3e113 in poll () from /lib/libc.so.6 #1 0x00adf52c in poll_dispatch () #2 0x00adcea3 in opal_event_loop () #3 0x00ad69f9 in opal_progress () #4 0x0098f294 in ompi_request_default_wait_all () #5 0x00a06e56 in ompi_coll_tuned_sendrecv_actual () #6 0x009ab8e3 in ompi_coll_tuned_barrier_intra_bruck () #7 0x009ac926 in ompi_coll_tuned_barrier_intra_dec_fixed () #8 0x009a0b20 in PMPI_Barrier () #9 0x00978c93 in pmpi_barrier__ () #10 0x004c0dc4 in sub_pbdirect_init (id=..., matrix_build=...) at sub_pbdirect_init.f90:62 #11 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 Remark : The same code compiled and run well with intel MPI library, from the same intel package, on the same nodes. Thanks for any help Françoise Roch
Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION
On May 10, 2011, at 2:30 AM, hi wrote: >> You didn't answer my prior questions. :-) > I am observing this crash using MPI_ALLREDUCE() in test program; and > which does not have any memory corruption issue. ;) Can you send the info listed on the help page? >> I ran your test program with -np 2 and -np 4 and it seemed to work ok. > Can you please let me know what environment (including os, compilers) > are you using? RHEL 5.4, gcc 4.5. This could be a Windows-specific thing, but I would find that unlikely (but heck, I don't know much about Windows...). > I am able to reproduce the crash using attached simplified test > program with 5 element array. > Please note that these experiments I am doing on Windows7 using > msys/mingw console; see attached makefile for more information. > > When running this program as "C:\>mpirun mar_f_dp2.exe" it works fine; > but when running it as "C:\>mpirun -np 2 mar_f_dp2.exe" it generates > following error on console... > > C:\>mpirun -np 2 mar_f_dp2.exe > 0 > 0 > 0 > size= 2 , rank= 0 > start -- > 0 > 0 > 0 > size= 2 , rank= 1 > start -- > forrtl: severe (157): Program Exception - access violation > Image PCRoutineLineSource > [vibgyor:09168] [[28311,0],0]-[[28311,1],0] mca_oob_tcp_msg_recv: > readv failed: Unknown error (108) You forgot ierr in the call to MPI_Finalize. You also paired DOUBLE_PRECISION data with MPI_INTEGER in the call to allreduce. And you mixed sndbuf and rcvbuf in the call to allreduce, meaning that when your print rcvbuf afterwards, it'll always still be 0. I pared your sample program down to the following: program Test_MPI use mpi implicit none DOUBLE PRECISION rcvbuf(5), sndbuf(5) INTEGER nproc, rank, ierr, n, i, ret n = 5 do i = 1, n sndbuf(i) = 2.0 rcvbuf(i) = 0.0 end do call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr) write(*,*) "size=", nproc, ", rank=", rank write(*,*) "start --, rcvbuf=", rcvbuf CALL MPI_ALLREDUCE(sndbuf, rcvbuf, n, & MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD, ierr) write(*,*) "end --, rcvbuf=", rcvbuf CALL MPI_Finalize(ierr) end (you could use "include 'mpif.h'", too -- I tried both) This program works fine for me. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] Issue with Open MPI 1.5.3 Windows binary builds
Good day, I am new to the Open MPI package, and so am starting at the beginning. I have little if any desire to build the binaries, so I was glad to see a Windows binary release. I started with I think is the minimum program: #include "mpi.h" int main(int argc, char* argv[]) { MPI_Init(&argc, &argv); MPI_Finalize(); return 0; } But, when I build and run this (with MS Visual C++ 2010 Express, running on Windows 7 x64), I get this error: [Tyler-Quad:06832] [[2206,0],0] ORTE_ERROR_LOG: Value out of bounds in file ..\. .\..\openmpi-1.5.3\orte\mca\oob\tcp\oob_tcp.c at line 1193 And it hangs there. As I mentioned, I am new to this project. Perhaps there is some simple configuration I failed to do after the install. Any clues welcome. Thank you, Tyler
Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
On 5/10/2011 6:43 AM, francoise.r...@obs.ujf-grenoble.fr wrote: Hi, I compile a parallel program with OpenMPI 1.4.1 (compiled with intel compilers 12 from composerxe package) . This program is linked to MUMPS library 4.9.2, compiled with the same compilers and link with intel MKL. The OS is linux debian. No error in compiling or running the job, but the program freeze inside a call to "zmumps" routine, when the slaves process call MPI_COMM_DUP routine. The program is executed on 2 nodes of 12 cores each (westmere processors) with the following command : mpirun -np 24 --machinefile $OAR_NODE_FILE -mca plm_rsh_agent "oarsh" --mca btl self,openib -x LD_LIBRARY_PATH ./prog We have 12 process running on each node. We submit the job with OAR batch scheduler (the $OAR_NODE_FILE variable and "oarsh" command are specific to this scheduler and are usually working well with openmpi ) via gdb, on the slaves, we can see that they are blocked in MPI_COMM_DUP : (gdb) where #0 0x2b32c1533113 in poll () from /lib/libc.so.6 #1 0x00adf52c in poll_dispatch () #2 0x00adcea3 in opal_event_loop () #3 0x00ad69f9 in opal_progress () #4 0x00a34b4e in mca_pml_ob1_recv () #5 0x009b0768 in ompi_coll_tuned_allreduce_intra_recursivedoubling () #6 0x009ac829 in ompi_coll_tuned_allreduce_intra_dec_fixed () #7 0x0097e271 in ompi_comm_allreduce_intra () #8 0x0097dd06 in ompi_comm_nextcid () #9 0x0097be01 in ompi_comm_dup () #10 0x009a0785 in PMPI_Comm_dup () #11 0x0097931d in pmpi_comm_dup__ () #12 0x00644251 in zmumps (id=...) at zmumps_part1.F:144 #13 0x004c0d03 in sub_pbdirect_init (id=..., matrix_build=...) at sub_pbdirect_init.f90:44 #14 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 the master wait further : (gdb) where #0 0x2b9dc9f3e113 in poll () from /lib/libc.so.6 #1 0x00adf52c in poll_dispatch () #2 0x00adcea3 in opal_event_loop () #3 0x00ad69f9 in opal_progress () #4 0x0098f294 in ompi_request_default_wait_all () #5 0x00a06e56 in ompi_coll_tuned_sendrecv_actual () #6 0x009ab8e3 in ompi_coll_tuned_barrier_intra_bruck () #7 0x009ac926 in ompi_coll_tuned_barrier_intra_dec_fixed () #8 0x009a0b20 in PMPI_Barrier () #9 0x00978c93 in pmpi_barrier__ () #10 0x004c0dc4 in sub_pbdirect_init (id=..., matrix_build=...) at sub_pbdirect_init.f90:62 #11 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 Remark : The same code compiled and run well with intel MPI library, from the same intel package, on the same nodes. Did you try compiling with equivalent options in each compiler? For example, (supposing you had gcc 4.6) gcc -O3 -funroll-loops --param max-unroll-times=2 -march=corei7 would be equivalent (as closely as I know) to icc -fp-model source -msse4.2 -ansi-alias As you should be aware, default settings in icc are more closely equivalent to gcc -O3 -ffast-math -fno-cx-limited-range -funroll-loops --param max-unroll-times=2 -fnostrict-aliasing The options I suggest as an upper limit are probably more aggressive than most people have used successfully with OpenMPI. As to run-time MPI options, Intel MPI has affinity with Westmere awareness turned on by default. I suppose testing without affinity settings, particularly when banging against all hyperthreads, is a more severe test of your application. Don't you get better results at 1 rank per core? -- Tim Prince
Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
On May 10, 2011, at 08:10 , Tim Prince wrote: > On 5/10/2011 6:43 AM, francoise.r...@obs.ujf-grenoble.fr wrote: >> >> Hi, >> >> I compile a parallel program with OpenMPI 1.4.1 (compiled with intel >> compilers 12 from composerxe package) . This program is linked to MUMPS >> library 4.9.2, compiled with the same compilers and link with intel MKL. >> The OS is linux debian. >> No error in compiling or running the job, but the program freeze inside >> a call to "zmumps" routine, when the slaves process call MPI_COMM_DUP >> routine. >> >> The program is executed on 2 nodes of 12 cores each (westmere >> processors) with the following command : >> >> mpirun -np 24 --machinefile $OAR_NODE_FILE -mca plm_rsh_agent "oarsh" >> --mca btl self,openib -x LD_LIBRARY_PATH ./prog >> >> We have 12 process running on each node. We submit the job with OAR >> batch scheduler (the $OAR_NODE_FILE variable and "oarsh" command are >> specific to this scheduler and are usually working well with openmpi ) >> >> via gdb, on the slaves, we can see that they are blocked in MPI_COMM_DUP : Francoise, Based on your traces the workers and the master are not doing the same MPI call. The workers are blocked in an MPI_Comm_dup in sub_pbdirect_init.f90:44, while the master is blocked in an MPI_Barrier in sub_pbdirect_init.f90:62. Can you verify that the slaves and the master are calling the MPI_Barrier and the MPI_Comm_dup in the same logical order? george. >> >> (gdb) where >> #0 0x2b32c1533113 in poll () from /lib/libc.so.6 >> #1 0x00adf52c in poll_dispatch () >> #2 0x00adcea3 in opal_event_loop () >> #3 0x00ad69f9 in opal_progress () >> #4 0x00a34b4e in mca_pml_ob1_recv () >> #5 0x009b0768 in >> ompi_coll_tuned_allreduce_intra_recursivedoubling () >> #6 0x009ac829 in ompi_coll_tuned_allreduce_intra_dec_fixed () >> #7 0x0097e271 in ompi_comm_allreduce_intra () >> #8 0x0097dd06 in ompi_comm_nextcid () >> #9 0x0097be01 in ompi_comm_dup () >> #10 0x009a0785 in PMPI_Comm_dup () >> #11 0x0097931d in pmpi_comm_dup__ () >> #12 0x00644251 in zmumps (id=...) at zmumps_part1.F:144 >> #13 0x004c0d03 in sub_pbdirect_init (id=..., matrix_build=...) >> at sub_pbdirect_init.f90:44 >> #14 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 >> >> >> the master wait further : >> >> (gdb) where >> #0 0x2b9dc9f3e113 in poll () from /lib/libc.so.6 >> #1 0x00adf52c in poll_dispatch () >> #2 0x00adcea3 in opal_event_loop () >> #3 0x00ad69f9 in opal_progress () >> #4 0x0098f294 in ompi_request_default_wait_all () >> #5 0x00a06e56 in ompi_coll_tuned_sendrecv_actual () >> #6 0x009ab8e3 in ompi_coll_tuned_barrier_intra_bruck () >> #7 0x009ac926 in ompi_coll_tuned_barrier_intra_dec_fixed () >> #8 0x009a0b20 in PMPI_Barrier () >> #9 0x00978c93 in pmpi_barrier__ () >> #10 0x004c0dc4 in sub_pbdirect_init (id=..., matrix_build=...) >> at sub_pbdirect_init.f90:62 >> #11 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 >> >> >> Remark : >> The same code compiled and run well with intel MPI library, from the >> same intel package, on the same nodes. >> > Did you try compiling with equivalent options in each compiler? For example, > (supposing you had gcc 4.6) > gcc -O3 -funroll-loops --param max-unroll-times=2 -march=corei7 > would be equivalent (as closely as I know) to > icc -fp-model source -msse4.2 -ansi-alias > > As you should be aware, default settings in icc are more closely equivalent to > gcc -O3 -ffast-math -fno-cx-limited-range -funroll-loops --param > max-unroll-times=2 -fnostrict-aliasing > > The options I suggest as an upper limit are probably more aggressive than > most people have used successfully with OpenMPI. > > As to run-time MPI options, Intel MPI has affinity with Westmere awareness > turned on by default. I suppose testing without affinity settings, > particularly when banging against all hyperthreads, is a more severe test of > your application. Don't you get better results at 1 rank per core? > -- > Tim Prince > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users "To preserve the freedom of the human mind then and freedom of the press, every spirit should be ready to devote itself to martyrdom; for as long as we may think as we will, and speak as we think, the condition of man will proceed in improvement." -- Thomas Jefferson, 1799
[OMPI users] Trouble with MPI-IO
I would appreciate someone with experience with MPI-IO look at the simple fortran program gzipped and attached to this note. It is imbedded in a script so that all that is necessary to run it is do: 'testio' from the command line. The program generates a small 2-D input array, sets up an MPI-IO environment, and write a 2-D output array twice, with the only difference being the displacement arrays used to construct the indexed datatype. For the first write, simple monotonically increasing displacements are used, for the second the displacements are 'shuffled' in one dimension. They are printed during the run. For the first case the file is written properly, but for the second the program hangs on MPI_FILE_WRITE_AT_ALL and must be aborted manually. Although the program is compiled as an mpi program, I am running on a single processor, which makes the problem more puzzling. The program should be relatively self-explanatory, but if more information is needed, please ask. I am on an 8 core Xeon based Dell workstation running Scientific Linux 5.5, Intel fortran 12.0.3, and OpenMPI 1.5.3. I have also attached output from 'ompi_info'. T. Rosmond testio.gz Description: GNU Zip compressed data info_ompi.gz Description: GNU Zip compressed data
Re: [OMPI users] is there an equiv of iprove for bcast?
Thanks, The messages are small and frequent (they flash metadata across the cluster). The current approach works fine for small to medium clusters but I want it to be able to go big. Maybe up to several hundred or even a thousands of nodes. Its these larger deployments that concern me. The current scheme may see the clearinghouse become overloaded in a very large cluster. >From what you have said, a possible strategy may be to combine the listener >and worker into a single process, using the non-blocking bcast just for that >group, while each worker scanned its own port for an incoming request, which >it would in turn bcast to its peers. As you have indicated though, this would depend on the load the non-blocking bcast would cause. - At least the load would be fairly even over the cluster. --- On Mon, 9/5/11, Jeff Squyres wrote: From: Jeff Squyres Subject: Re: [OMPI users] is there an equiv of iprove for bcast? To: randolph_pul...@yahoo.com.au Cc: "Open MPI Users" Received: Monday, 9 May, 2011, 11:27 PM On May 3, 2011, at 8:20 PM, Randolph Pullen wrote: > Sorry, I meant to say: > - on each node there is 1 listener and 1 worker. > - all workers act together when any of the listeners send them a request. > - currently I must use an extra clearinghouse process to receive from any of > the listeners and bcast to workers, this is unfortunate because of the > potential scaling issues > > I think you have answered this in that I must wait for MPI-3's non-blocking > collectives. Yes and no. If each worker starts N non-blocking broadcasts just to be able to test for completion of any of them, you might end up consuming a bunch of resources for them (I'm *anticipating* that pending non-blocking collective requests maybe more heavyweight than pending non-blocking point-to-point requests). But then again, if N is small, it might not matter. > Can anyone suggest another way? I don't like the serial clearinghouse > approach. If you only have a few workers and/or the broadcast message is small and/or the broadcasts aren't frequent, then MPI's built-in broadcast algorithms might not offer much more optimization than doing your own with point-to-point mechanisms. I don't usually recommend this, but it may be possible for your case. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/