Re: [OMPI users] Error compiling openmpi-1.6.4a1r27766 on Solaris 10
Hello Siegmar, this problem is already fixed in the OMPI trunk (r27770), but currently not moved to the v1.6 branch. To make it working for the v1.6 branch, just copy the following files from the trunk to your v1.6 checkout and re-run autogen.sh: ompi/contrib/vt/vt/config/m4/acinclude.execwrap.m4 ompi/contrib/vt/vt/vtlib/vt_execwrap.c Regards, Matthias Jurenz > Hi > I tried to build openmpi-1.6.4a1r27766 on Solaris 10 Sparc and x86_64 with Sun C 5.12 and gcc-4.7.1 and got the following error for all combinations. > ... CC vt_execwrap.lo "../../../../../../openmpi-1.6.4a1r27766/ompi/contrib/vt/vt/vtlib/vt_execwrap.c", line 187: warning: implicit function declaration: VTTHRD_MALLOC_TRACING_ENABLED "../../../../../../openmpi-1.6.4a1r27766/ompi/contrib/vt/vt/vtlib/vt_execwrap.c", line 358: undefined symbol: environ "../../../../../../openmpi-1.6.4a1r27766/ompi/contrib/vt/vt/vtlib/vt_execwrap.c", line 358: warning: improper pointer/integer combination: op "=" "../../../../../../openmpi-1.6.4a1r27766/ompi/contrib/vt/vt/vtlib/vt_execwrap.c", line 410: undefined symbol: environ "../../../../../../openmpi-1.6.4a1r27766/ompi/contrib/vt/vt/vtlib/vt_execwrap.c", line 410: warning: improper pointer/integer combination: op "=" cc: acomp failed for .../openmpi-1.6.4a1r27766/ompi/contrib/vt/vt/vtlib/vt_execwrap.c make[5]: *** [vt_execwrap.lo] Error 1 make[5]: Leaving directory `.../ompi/contrib/vt/vt/vtlib' make[4]: *** [all-recursive] Error 1 ... > I would be grateful if you could solve the problem. Thank you very much for your help in advance. > Kind regards > Siegmar
[OMPI users] Error running program : mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor
Hello all. I want to learn MPI and I've trying to setting up OMPI for first time on three nodes. My config above: Ubuntu server - master node: pruebaborja 2x Ubuntu Desktop - slaves node: clienteprueba clientepruebados 4 slots Im running NFSv4 for sharing /home/mpiuser. I want to test a plain "Hello world"but I can't make it working successfully on node clienteprueba. This is the problem: mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo [clienteprueba:01993] [[64434,0], 2] -> [[64434,0],0] mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor (9) [sd = 9] [clienteprueba:01993] [[64434,0], 2] routed:binomial: Connection to lifeline [[64434,0],0] lost However, with clientepruebados and pruebaborja only on my hostfile, it works: pruebaborja slots=1 clientepruebados slots=4 #clienteprueba slots=1 mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo Hola, mundo, soy pruebaborja: 0 de 6 Hola, mundo, soy pruebaborja: 5 de 6 Hola, mundo, soy clientepruebados: 1 de 6 Hola, mundo, soy clientepruebados: 2 de 6 Hola, mundo, soy clientepruebados: 3 de 6 Hola, mundo, soy clientepruebados: 4 de 6 I've checked the OMPI versions on the machines and it's the same. I can't understand why Im getting this error on clienteprueba; i've done the same config on clientepruebados and clienteprueba. Anyone could help me to solve this? Sorry for my english. Thanks in advance
Re: [OMPI users] Initializing OMPI with invoking the array constructor on Fortran derived types causes the executable to crash
FWIW, I can replicate the behavior with gfortran 4.7.2: - program runs "fine" with no MPI_Init/MPI_Finalize - program dumps core when MPI_Init/MPI_Finalize are called (at the "conc = [ xx, yy ]" statement) I notice that even if I disable Open MPI's memory hooking, the coredump still occurs. Sidenote: there's a few ways to disable OMPI's memory hooking; one of the easiest is to set the environment variable FAKEROOTKEY to any value, because OMPI disables its memory hooking in Debian Fakeroot environments. For example: - % setenv FAKEROOTKEY 0 % mpifort -g arrays.f90 && ./a.out *** glibc detected *** ./a.out: free(): invalid pointer: 0x00369ef9cf48 *** ...etc. - Specifically: with OMPI's memory hooking disabled, we don't modify the behavior of malloc/free/memalign/realloc. I'm not sure what Open MPI is doing to anger the gfortran gods, but I did note that when I run the program without MPI_Init/MPI_Finalize, valgrind complains: - ==7269== Conditional jump or move depends on uninitialised value(s) ==7269==at 0x4015B0: MAIN__ (arrays.f90:20) ==7269==by 0x401795: main (arrays.f90:26) - Line 20 is the "conc = [ xx, yy ]" statement. I'm not enough of a Fortran guru to know what that means (to my eyes, xx and yy were just initialized above that -- perhaps it's complaining about conc?), but there you go. :-) On Jan 14, 2013, at 6:08 AM, Stefan Mauerberger wrote: > Well, I missed to emphasize one thing: It is my intension to exploit > F2003's lhs-(re)allocate feature. Meaning, it is totally legal in F03 to > write something like that: > integer, allocatable :: array(:) > array = [ 1,2,3,4 ] > array = [ 1 ] > where 'array' gets automatically (re)allocated. One more thing I should > mention: In case 'array' is manually allocate, everything is fine. > > Ok, lets do a little case study and make my suggested minimal example a > little more exhaustive: > PROGRAM main > >IMPLICIT NONE >!INCLUDE 'mpif.h' > >INTEGER :: ierr > >TYPE :: test_typ >REAL, ALLOCATABLE :: a(:) >END TYPE > >TYPE(test_typ) :: xx, yy >TYPE(test_typ), ALLOCATABLE :: conc(:) > >!CALL mpi_init( ierr ) > >xx = test_typ( a=[1.0] ) >yy = test_typ( a=[2.0,1.0] ) > >conc = [ xx, yy ] > >WRITE(*,*) SIZE(conc) > >!CALL mpi_finalize( ierr ) > > END PROGRAM main > Note: For the beginning all MPI-stuff is commented out; xx and yy are > initialized and their member-variable 'a' is allocated. > > For now, assume it as purely serial. That piece of code complies and > runs properly with: > * gfortran 4.7.1, 4.7.2 and 4.8.0 (experimental) > * ifort 12.1 and 13.0 (-assume realloc_lhs) > * nagfort 5.3 > On the contrary it terminates, throwing a segfault, with > * pgfortran 12.9 > Well, for the following lets simply drop PGI. In addition, according to > 'The Fortran 2003 Handbook' published by Springer in 2009, the > usage of the array constructor [...] is appropriate and valid. > > As a second step lets try to compile and run it invoking OMPI, just > considering INCLUDE 'mpif.h': > * gfortran: all right > * ifort: all right > * nagfor: all right > > Finally, lets initialize MPI by calling MPI_Init() and MPI_Finalize(): > * gfortran + OMPI: *** glibc detected *** ./a.out: free(): invalid > pointer ... > * gfortran + Intel-MPI: *** glibc detected *** ./a.out: free(): invalid > pointer ... > * ifort + OMPI: all right > * nagfor + OMPI: all right (-thread_safe) > > Well, you are right, this is a very strong indication to blame gfortran > for that! However, it gets even more confusing. Instead of linking > against OMPI, the following results are obtained by invoking IBM's MPI > implementation: > * gfortran + IBM-MPI: all right > * ifort + IBM-MPI: all right > Isn't that weired? > > Any suggestions? Might it be useful to submit a bug-report to GCC > developers? > > Cheers, > Stefan > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Error running program : mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor
Try disabling firewalling between your nodes. The easiest way is "sudo service iptables stop". On Jan 16, 2013, at 7:46 AM, borja mf wrote: > Hello all. > I want to learn MPI and I've trying to setting up OMPI for first time on > three nodes. My config above: > Ubuntu server - master node: pruebaborja > 2x Ubuntu Desktop - slaves node: > clienteprueba > clientepruebados 4 slots > > Im running NFSv4 for sharing /home/mpiuser. > I want to test a plain "Hello world"but I can't make it working successfully > on node clienteprueba. This is the problem: > > mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo > [clienteprueba:01993] [[64434,0], 2] -> [[64434,0],0] > mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor (9) [sd = 9] > [clienteprueba:01993] [[64434,0], 2] routed:binomial: Connection to lifeline > [[64434,0],0] lost > > However, with clientepruebados and pruebaborja only on my hostfile, it works: > > pruebaborja slots=1 > clientepruebados slots=4 > #clienteprueba slots=1 > > mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo > Hola, mundo, soy pruebaborja: 0 de 6 > Hola, mundo, soy pruebaborja: 5 de 6 > Hola, mundo, soy clientepruebados: 1 de 6 > Hola, mundo, soy clientepruebados: 2 de 6 > Hola, mundo, soy clientepruebados: 3 de 6 > Hola, mundo, soy clientepruebados: 4 de 6 > > I've checked the OMPI versions on the machines and it's the same. I can't > understand why Im getting this error on clienteprueba; i've done the same > config on clientepruebados and clienteprueba. Anyone could help me to solve > this? > > Sorry for my english. > Thanks in advance > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Windows MPI with Fortran calling programs
On Jan 14, 2013, at 8:57 AM, Said Elnoshokaty wrote: > Parallel processing is needed to speed up processing of large-scale master > and sub-problems. 32-bit Microsoft Access 2007 is used to capture data and > then calls a DLL program written in 32-bit Microsoft Fortran 90 for > processing (to be distributed in parallel among master and sub-problems). > Operating system is 64-bit Windows 7. Hardware is PCs core i3 and i5. Network > is Ethernet 5. Please advise on the possibility of having MPI installed on > this platform and how to install, if possible. Open MPI recently lost its Windows developer, and Windows support has been removed from the upcoming v1.7 release. Your best bet is likely to use Microsoft MPI. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] help me understand these error msgs
Hello, I have a large fortran code processing data (weather forecast). It runs ok with smaller dataset, but on larger dataset I get some errors I've never seen before: node061:05144] [[55141,0],11]->[[55141,0],0] mca_oob_tcp_msg_send_handler: writev failed: Bad file descriptor (9) [sd = 9] [node061:05144] [[55141,0],11] routed:binomial: Connection to lifeline [[55141,0],0] lost and node084:7.0.Non-fatal temporary exhaustion of send tid dma descriptors (elapsed=43.788s, source LID=0x49/context=11, count=1) (err=0) I'm using QLogic software version 7.1.0.0.58 (ofed 1.5.4.1, open-mpi 1.4.3). I'm starting this program with mpirun -mca btl openib,sm,self so I don't really understand what tcp has to do in the first error message. Also I traced second error message to psm code, but it appears even if i add -mca mtl ^psm to my mpirun arguments. Why? Any help appreciated. -- Jure Pečar http://jure.pecar.org
Re: [OMPI users] help me understand these error msgs
On Jan 16, 2013, at 7:41 AM, Jure Pečar wrote: > > Hello, > > I have a large fortran code processing data (weather forecast). It runs ok > with smaller dataset, but on larger dataset I get some errors I've never seen > before: > > node061:05144] [[55141,0],11]->[[55141,0],0] mca_oob_tcp_msg_send_handler: > writev failed: Bad file descriptor (9) [sd = 9] > [node061:05144] [[55141,0],11] routed:binomial: Connection to lifeline > [[55141,0],0] lost This one means that a backend node lost its connection to mpirun. We use a TCP socket between the daemon on a node and mpirun to launch the processes and to detect if/when that node fails for some reason. > > and > > node084:7.0.Non-fatal temporary exhaustion of send tid dma descriptors > (elapsed=43.788s, source LID=0x49/context=11, count=1) (err=0) > > I'm using QLogic software version 7.1.0.0.58 (ofed 1.5.4.1, open-mpi 1.4.3). > > I'm starting this program with mpirun -mca btl openib,sm,self so I don't > really understand what tcp has to do in the first error message. > > Also I traced second error message to psm code, but it appears even if i add > -mca mtl ^psm to my mpirun arguments. Why? > > Any help appreciated. > > > -- > > Jure Pečar > http://jure.pecar.org > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Error running program : mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor
Getting the same error... I forgot to say that I must to use Ubuntu and Im compiling with mpicc. My code is written on C. Thank for answer. Im going crazy with this problem. There's not much info about. 2013/1/16 Jeff Squyres (jsquyres) > Try disabling firewalling between your nodes. The easiest way is "sudo > service iptables stop". > > > On Jan 16, 2013, at 7:46 AM, borja mf > wrote: > > > Hello all. > > I want to learn MPI and I've trying to setting up OMPI for first time on > three nodes. My config above: > > Ubuntu server - master node: pruebaborja > > 2x Ubuntu Desktop - slaves node: > > clienteprueba > > clientepruebados 4 slots > > > > Im running NFSv4 for sharing /home/mpiuser. > > I want to test a plain "Hello world"but I can't make it working > successfully on node clienteprueba. This is the problem: > > > > mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo > > [clienteprueba:01993] [[64434,0], 2] -> [[64434,0],0] > mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor (9) [sd = > 9] > > [clienteprueba:01993] [[64434,0], 2] routed:binomial: Connection to > lifeline [[64434,0],0] lost > > > > However, with clientepruebados and pruebaborja only on my hostfile, it > works: > > > > pruebaborja slots=1 > > clientepruebados slots=4 > > #clienteprueba slots=1 > > > > mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo > > Hola, mundo, soy pruebaborja: 0 de 6 > > Hola, mundo, soy pruebaborja: 5 de 6 > > Hola, mundo, soy clientepruebados: 1 de 6 > > Hola, mundo, soy clientepruebados: 2 de 6 > > Hola, mundo, soy clientepruebados: 3 de 6 > > Hola, mundo, soy clientepruebados: 4 de 6 > > > > I've checked the OMPI versions on the machines and it's the same. I > can't understand why Im getting this error on clienteprueba; i've done the > same config on clientepruebados and clienteprueba. Anyone could help me to > solve this? > > > > Sorry for my english. > > Thanks in advance > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Error running program : mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor
If you login to eprueba and try to ping pruebaborja, can you do it? What network is it using? Sometimes the problem is that you have multiple ethernet interfaces on the machines and we pick the wrong one - i.e., one that cannot connect to the other machine. There are ways to help resolve the problem if that's the case, but first check to see. Also, if you configure OMPI --enable-debug, there are diagnostics you can enable that will help debug the problem. On Jan 16, 2013, at 7:59 AM, borja mf wrote: > Getting the same error... > I forgot to say that I must to use Ubuntu and Im compiling with mpicc. My > code is written on C. > > Thank for answer. > > Im going crazy with this problem. There's not much info about. > > 2013/1/16 Jeff Squyres (jsquyres) > Try disabling firewalling between your nodes. The easiest way is "sudo > service iptables stop". > > > On Jan 16, 2013, at 7:46 AM, borja mf > wrote: > > > Hello all. > > I want to learn MPI and I've trying to setting up OMPI for first time on > > three nodes. My config above: > > Ubuntu server - master node: pruebaborja > > 2x Ubuntu Desktop - slaves node: > > clienteprueba > > clientepruebados 4 slots > > > > Im running NFSv4 for sharing /home/mpiuser. > > I want to test a plain "Hello world"but I can't make it working > > successfully on node clienteprueba. This is the problem: > > > > mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo > > [clienteprueba:01993] [[64434,0], 2] -> [[64434,0],0] > > mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor (9) [sd = > > 9] > > [clienteprueba:01993] [[64434,0], 2] routed:binomial: Connection to > > lifeline [[64434,0],0] lost > > > > However, with clientepruebados and pruebaborja only on my hostfile, it > > works: > > > > pruebaborja slots=1 > > clientepruebados slots=4 > > #clienteprueba slots=1 > > > > mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo > > Hola, mundo, soy pruebaborja: 0 de 6 > > Hola, mundo, soy pruebaborja: 5 de 6 > > Hola, mundo, soy clientepruebados: 1 de 6 > > Hola, mundo, soy clientepruebados: 2 de 6 > > Hola, mundo, soy clientepruebados: 3 de 6 > > Hola, mundo, soy clientepruebados: 4 de 6 > > > > I've checked the OMPI versions on the machines and it's the same. I can't > > understand why Im getting this error on clienteprueba; i've done the same > > config on clientepruebados and clienteprueba. Anyone could help me to > > solve this? > > > > Sorry for my english. > > Thanks in advance > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Error running program : mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor
Make sure you disable iptables on all the nodes. Also, check that all your IP interfaces are configured correctly. Do you have IP interfaces for only real ethernet connections and loopback? Or do you have other interfaces (e.g., for virtual machines)? If you have interfaces for virtual machines, you'll need to exclude them from Open MPI -- see http://www.open-mpi.org/faq/?category=tcp#tcp-selection. On Jan 16, 2013, at 10:59 AM, borja mf wrote: > Getting the same error... > I forgot to say that I must to use Ubuntu and Im compiling with mpicc. My > code is written on C. > > Thank for answer. > > Im going crazy with this problem. There's not much info about. > > 2013/1/16 Jeff Squyres (jsquyres) > Try disabling firewalling between your nodes. The easiest way is "sudo > service iptables stop". > > > On Jan 16, 2013, at 7:46 AM, borja mf > wrote: > > > Hello all. > > I want to learn MPI and I've trying to setting up OMPI for first time on > > three nodes. My config above: > > Ubuntu server - master node: pruebaborja > > 2x Ubuntu Desktop - slaves node: > > clienteprueba > > clientepruebados 4 slots > > > > Im running NFSv4 for sharing /home/mpiuser. > > I want to test a plain "Hello world"but I can't make it working > > successfully on node clienteprueba. This is the problem: > > > > mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo > > [clienteprueba:01993] [[64434,0], 2] -> [[64434,0],0] > > mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor (9) [sd = > > 9] > > [clienteprueba:01993] [[64434,0], 2] routed:binomial: Connection to > > lifeline [[64434,0],0] lost > > > > However, with clientepruebados and pruebaborja only on my hostfile, it > > works: > > > > pruebaborja slots=1 > > clientepruebados slots=4 > > #clienteprueba slots=1 > > > > mpiuser@pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo > > Hola, mundo, soy pruebaborja: 0 de 6 > > Hola, mundo, soy pruebaborja: 5 de 6 > > Hola, mundo, soy clientepruebados: 1 de 6 > > Hola, mundo, soy clientepruebados: 2 de 6 > > Hola, mundo, soy clientepruebados: 3 de 6 > > Hola, mundo, soy clientepruebados: 4 de 6 > > > > I've checked the OMPI versions on the machines and it's the same. I can't > > understand why Im getting this error on clienteprueba; i've done the same > > config on clientepruebados and clienteprueba. Anyone could help me to > > solve this? > > > > Sorry for my english. > > Thanks in advance > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] libmpi_f90 shared lib version number change in 1.6.3
On Jan 12, 2013, at 5:06 AM, Ake Sandgren wrote: > Was the change for libmpi_f90 in VERSION intentional or a typo? > This is from openmpi 1.6.3 > libmpi_f90_so_version=4:0:1 > 1.6.1 had > libmpi_f90_so_version=2:0:1 It was both intentional and a typo. Specifically, it really should have bee 4:0:3. :-( Meaning: we unintentionally broke the F90 ABI for 1.6.3 (specifically: OMPI applications compiled to utilize "use mpi"). :-( :-( :-( This ABI compatibility will be restored in 1.6.4. - See these commit messages for a fuller explanation: https://svn.open-mpi.org/trac/ompi/changeset/27471 https://svn.open-mpi.org/trac/ompi/changeset/27558 The short explanation is that, in terms of the "use mpi" interface, all Open MPI 1.6.x versions are ABI compatible except 1.6.3. These work: - compile a "use mpi" OMPI application with 1.6.x (where x!=3), change your LD_LIBRARY_PATH to point to a different OMPI 1.6.x installation (where x!=3) - compile a "use mpi" OMPI application with 1.6.3, use it with OMPI 1.6.3 installation These do not: - compile a "use mpi" OMPI application with 1.6.x (where x!=3), change your LD_LIBRARY_PATH to point an OMPI 1.6.3 installation - compile a "use mpi" OMPI application with 1.6.3, use it with OMPI 1.6.x installation (where x!=3) I will make an FAQ item about this so that the result is google-able. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] openmpi 1.4 vs. 1.6 internals
There is a significant improvement in non-blocking MPI calls (over Infiniband) from version 1.4 to version 1.6. I am comparing two methods to exchange messages between two nodes. The message size varies from 1 MB to 1 GB. The first method is sends using MPI_Isend()and receives using MPI_Irecv(). The same buffers are used repeatedly to exchange messages between two nodes. The buffers are allocated using malloc(). In the second method, the buffers are allocated using MPI_Alloc_mem() and the send and receive are initialized using MPI_Send_init() and MPI_Recv_init(). The sends and recvs are posted using MPI_Start. In version 1.4, the first method has a peak bidirectional bandwidth of 5.3 GB/s and the second method has a peak of 6.2 GB/s. In version 1.6, both methods have peak bandwidth of 6.2 GB/s. The peak bandwidths are pretty close to the number reported by ib_read_bw or ib_write_bw commands for Infiniband. 1. The first question is as follows: why does version 1.6 do nonblocking Isend/Irecv better than version 1.4? I would assume that in the second method, memory is pinned and registered during MPI_Alloc_mem() and the transfers use RDMA direct. In the first method, where the buffers are allocated using malloc(), I would assume that RDMA pipelining is used. I emphasize that the mpi_leave_pinned parameter has its default value of -1 and is turned off in all the runs. I would expect some overhead due to registering and unregistering memory during each Isend/Irecv, even though pipelining tries to amortize the costs. The numbers for version 1.4 are in line with this expectation. However, in version 1.6 there seems to be no overhead at all due to registering/unregistering memory. What is going on? Do large messages still use RDMA pipelining? How has the RDMA pipeline been improved? 2. To send and receive a large message, openmpi may choose between RDMA write and RDMA read. If RDMA pipelining is used, it seems advantageous to use RDMA write because some fragments use send/recv semantics. If the memory is registered and the send/recv result in a single RDMA operation, there seems nothing to choose between the two. Is that correct? If so, does openmpi use RDMA write or RDMA read? Thanks! Divakar Viswanath
[OMPI users] Problem with mpirun for java codes
Hi, I am still struggling with the installation problems! I get very strange errors. everything is fine when I run OpenMPI for C codes, but when I try to run a simple java code I get very strange error. The code is as simple as the following and I can not get it running: import mpi.*; class JavaMPI { public static void main(String[] args) throws MPIException { MPI.Init(args); System.out.println("Hello world from rank " + MPI.COMM_WORLD.Rank() + " of " + MPI.COMM_WORLD.Size() ); MPI.Finalize(); } } everything is ok with mpijavac, my java code, etc. when I try to run the code with the following command: /usr/local/bin/mpijavac -d classes JavaMPI.java --> FINE /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI --> *ERROR* I'll the following error. Could you please help me about this (As I mentioned the I can run C MPI codes without any problem ). The system specifications are: JRE version: 6.0_30-b12 (java-sun-6) OS: Linux 3.0.0-30-generic-pae #47-Ubuntu CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 stepping 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, ht ## # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216 # (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984 # # JRE version: 6.0_30-b12 # JRE version: 6.0_30-b12 # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) # Problematic frame: # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 # # An error report file with more information is saved as: # /home/karos/hs_err_pid28616.log # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) # Problematic frame: # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 # # An error report file with more information is saved as: # /home/karos/hs_err_pid28615.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # [tulips:28616] *** Process received signal *** [tulips:28616] Signal: Aborted (6) [tulips:28616] Signal code: (-6) [tulips:28616] [ 0] [0xb777840c] [tulips:28616] [ 1] [0xb7778424] [tulips:28616] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75e3cff] [tulips:28616] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75e7325] [tulips:28616] [ 4] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) [0xb6f6df7f] [tulips:28616] [ 5] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) [0xb70b5897] [tulips:28616] [ 6] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c) [0xb6f7529c] [tulips:28616] [ 7] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) [0xb6f70f64] [tulips:28616] [ 8] [0xb777840c] [tulips:28616] [ 9] [0xb3891548] [tulips:28616] *** End of error message *** [tulips:28615] *** Process received signal *** [tulips:28615] Signal: Aborted (6) [tulips:28615] Signal code: (-6) # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # [tulips:28615] [ 0] [0xb778040c] [tulips:28615] [ 1] [0xb7780424] [tulips:28615] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75ebcff] [tulips:28615] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75ef325] [tulips:28615] [ 4] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) [0xb6f75f7f] [tulips:28615] [ 5] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) [0xb70bd897] [tulips:28615] [ 6] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c) [0xb6f7d29c] [tulips:28615] [ 7] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) [0xb6f78f64] [tulips:28615] [ 8] [0xb778040c] [tulips:28615] [ 9] [0xb3899548] [tulips:28615] *** End of error message *** -- mpirun noticed that process rank 1 with PID 28616 on node tulips exited on signal 6 (Aborted). -- ## -- Regards, Karos Lotfifar
[OMPI users] help - Problem with mpirun for java codes
Hi, I am still struggling with the installation problems! I get very strange errors. everything is fine when I run OpenMPI for C codes, but when I try to run a simple java code I get very strange error. The code is as simple as the following and I can not get it running: import mpi.*; class JavaMPI { public static void main(String[] args) throws MPIException { MPI.Init(args); System.out.println("Hello world from rank " + MPI.COMM_WORLD.Rank() + " of " + MPI.COMM_WORLD.Size() ); MPI.Finalize(); } } everything is ok with mpijavac, my java code, etc. when I try to run the code with the following command: /usr/local/bin/mpijavac -d classes JavaMPI.java --> FINE /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI --> *ERROR* I'll the following error. Could you please help me about this (As I mentioned the I can run C MPI codes without any problem ). The system specifications are: JRE version: 6.0_30-b12 (java-sun-6) OS: Linux 3.0.0-30-generic-pae #47-Ubuntu CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 stepping 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, ht ## # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216 # (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984 # # JRE version: 6.0_30-b12 # JRE version: 6.0_30-b12 # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) # Problematic frame: # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 # # An error report file with more information is saved as: # /home/karos/hs_err_pid28616.log # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) # Problematic frame: # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 # # An error report file with more information is saved as: # /home/karos/hs_err_pid28615.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # [tulips:28616] *** Process received signal *** [tulips:28616] Signal: Aborted (6) [tulips:28616] Signal code: (-6) [tulips:28616] [ 0] [0xb777840c] [tulips:28616] [ 1] [0xb7778424] [tulips:28616] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75e3cff] [tulips:28616] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75e7325] [tulips:28616] [ 4] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) [0xb6f6df7f] [tulips:28616] [ 5] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) [0xb70b5897] [tulips:28616] [ 6] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c) [0xb6f7529c] [tulips:28616] [ 7] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) [0xb6f70f64] [tulips:28616] [ 8] [0xb777840c] [tulips:28616] [ 9] [0xb3891548] [tulips:28616] *** End of error message *** [tulips:28615] *** Process received signal *** [tulips:28615] Signal: Aborted (6) [tulips:28615] Signal code: (-6) # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # [tulips:28615] [ 0] [0xb778040c] [tulips:28615] [ 1] [0xb7780424] [tulips:28615] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75ebcff] [tulips:28615] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75ef325] [tulips:28615] [ 4] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) [0xb6f75f7f] [tulips:28615] [ 5] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) [0xb70bd897] [tulips:28615] [ 6] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c) [0xb6f7d29c] [tulips:28615] [ 7] /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) [0xb6f78f64] [tulips:28615] [ 8] [0xb778040c] [tulips:28615] [ 9] [0xb3899548] [tulips:28615] *** End of error message *** -- mpirun noticed that process rank 1 with PID 28616 on node tulips exited on signal 6 (Aborted). -- ## -- Regards, Karos Lotfifar
Re: [OMPI users] Problem with mpirun for java codes
Which version of OMPI are you using? On Jan 16, 2013, at 11:43 AM, Karos Lotfifar wrote: > Hi, > > I am still struggling with the installation problems! I get very strange > errors. everything is fine when I run OpenMPI for C codes, but when I try to > run a simple java code I get very strange error. The code is as simple as the > following and I can not get it running: > > import mpi.*; > > class JavaMPI { > public static void main(String[] args) throws MPIException { > MPI.Init(args); > System.out.println("Hello world from rank " + > MPI.COMM_WORLD.Rank() + " of " + > MPI.COMM_WORLD.Size() ); > MPI.Finalize(); > } > } > > everything is ok with mpijavac, my java code, etc. when I try to run the code > with the following command: > > /usr/local/bin/mpijavac -d classes JavaMPI.java --> FINE > /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI --> *ERROR* > > I'll the following error. Could you please help me about this (As I mentioned > the I can run C MPI codes without any problem ). The system specifications > are: > > JRE version: 6.0_30-b12 (java-sun-6) > OS: Linux 3.0.0-30-generic-pae #47-Ubuntu > CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 stepping > 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, ht > > > > > ## > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV# > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216 > # > (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984 > # > # JRE version: 6.0_30-b12 > # JRE version: 6.0_30-b12 > # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) > # Problematic frame: > # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 > # > # An error report file with more information is saved as: > # /home/karos/hs_err_pid28616.log > # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) > # Problematic frame: > # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 > # > # An error report file with more information is saved as: > # /home/karos/hs_err_pid28615.log > # > # If you would like to submit a bug report, please visit: > # http://java.sun.com/webapps/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > # > [tulips:28616] *** Process received signal *** > [tulips:28616] Signal: Aborted (6) > [tulips:28616] Signal code: (-6) > [tulips:28616] [ 0] [0xb777840c] > [tulips:28616] [ 1] [0xb7778424] > [tulips:28616] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75e3cff] > [tulips:28616] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75e7325] > [tulips:28616] [ 4] > /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) > [0xb6f6df7f] > [tulips:28616] [ 5] > /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) > [0xb70b5897] > [tulips:28616] [ 6] > /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c) > [0xb6f7529c] > [tulips:28616] [ 7] > /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) > [0xb6f70f64] > [tulips:28616] [ 8] [0xb777840c] > [tulips:28616] [ 9] [0xb3891548] > [tulips:28616] *** End of error message *** > [tulips:28615] *** Process received signal *** > [tulips:28615] Signal: Aborted (6) > [tulips:28615] Signal code: (-6) > # > # If you would like to submit a bug report, please visit: > # http://java.sun.com/webapps/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > # > [tulips:28615] [ 0] [0xb778040c] > [tulips:28615] [ 1] [0xb7780424] > [tulips:28615] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75ebcff] > [tulips:28615] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75ef325] > [tulips:28615] [ 4] > /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) > [0xb6f75f7f] > [tulips:28615] [ 5] > /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) > [0xb70bd897] > [tulips:28615] [ 6] > /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c) > [0xb6f7d29c] > [tulips:28615] [ 7] > /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) > [0xb6f78f64] > [tulips:28615] [ 8] [0xb778040c] > [tulips:28615] [ 9] [0xb3899548] > [tulips:28615] *** End of error message *** > -- > mpirun noticed that process rank 1 with PID 28616 on node tulips exited on > signal 6 (Aborted). > -- > >
Re: [OMPI users] Problem with mpirun for java codes
Hi, The version that I am using is 1.7rc6 (pre-release) Regards, Karos On 16 Jan 2013, at 21:07, Ralph Castain wrote: > Which version of OMPI are you using? > > > On Jan 16, 2013, at 11:43 AM, Karos Lotfifar wrote: > >> Hi, >> >> I am still struggling with the installation problems! I get very strange >> errors. everything is fine when I run OpenMPI for C codes, but when I try to >> run a simple java code I get very strange error. The code is as simple as >> the following and I can not get it running: >> >> import mpi.*; >> >> class JavaMPI { >> public static void main(String[] args) throws MPIException { >> MPI.Init(args); >> System.out.println("Hello world from rank " + >> MPI.COMM_WORLD.Rank() + " of " + >> MPI.COMM_WORLD.Size() ); >> MPI.Finalize(); >> } >> } >> >> everything is ok with mpijavac, my java code, etc. when I try to run the >> code with the following command: >> >> /usr/local/bin/mpijavac -d classes JavaMPI.java --> FINE >> /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI --> *ERROR* >> >> I'll the following error. Could you please help me about this (As I >> mentioned the I can run C MPI codes without any problem ). The system >> specifications are: >> >> JRE version: 6.0_30-b12 (java-sun-6) >> OS: Linux 3.0.0-30-generic-pae #47-Ubuntu >> CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 stepping >> 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, ht >> >> >> >> >> ## >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV# >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216 >> # >> (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984 >> # >> # JRE version: 6.0_30-b12 >> # JRE version: 6.0_30-b12 >> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) >> # Problematic frame: >> # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 >> # >> # An error report file with more information is saved as: >> # /home/karos/hs_err_pid28616.log >> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) >> # Problematic frame: >> # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 >> # >> # An error report file with more information is saved as: >> # /home/karos/hs_err_pid28615.log >> # >> # If you would like to submit a bug report, please visit: >> # http://java.sun.com/webapps/bugreport/crash.jsp >> # The crash happened outside the Java Virtual Machine in native code. >> # See problematic frame for where to report the bug. >> # >> [tulips:28616] *** Process received signal *** >> [tulips:28616] Signal: Aborted (6) >> [tulips:28616] Signal code: (-6) >> [tulips:28616] [ 0] [0xb777840c] >> [tulips:28616] [ 1] [0xb7778424] >> [tulips:28616] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75e3cff] >> [tulips:28616] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75e7325] >> [tulips:28616] [ 4] >> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) >> [0xb6f6df7f] >> [tulips:28616] [ 5] >> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) >> [0xb70b5897] >> [tulips:28616] [ 6] >> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c) >> [0xb6f7529c] >> [tulips:28616] [ 7] >> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) >> [0xb6f70f64] >> [tulips:28616] [ 8] [0xb777840c] >> [tulips:28616] [ 9] [0xb3891548] >> [tulips:28616] *** End of error message *** >> [tulips:28615] *** Process received signal *** >> [tulips:28615] Signal: Aborted (6) >> [tulips:28615] Signal code: (-6) >> # >> # If you would like to submit a bug report, please visit: >> # http://java.sun.com/webapps/bugreport/crash.jsp >> # The crash happened outside the Java Virtual Machine in native code. >> # See problematic frame for where to report the bug. >> # >> [tulips:28615] [ 0] [0xb778040c] >> [tulips:28615] [ 1] [0xb7780424] >> [tulips:28615] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75ebcff] >> [tulips:28615] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75ef325] >> [tulips:28615] [ 4] >> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) >> [0xb6f75f7f] >> [tulips:28615] [ 5] >> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) >> [0xb70bd897] >> [tulips:28615] [ 6] >> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c) >> [0xb6f7d29c] >> [tulips:28615] [ 7] >> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) >> [0xb6f78f64] >> [tulips:28615] [ 8] [0xb778040c] >> [tulips:28615] [ 9] [0xb3899548] >> [tulips:28615] *** End of error message *** >> --