Re: [OMPI users] openmpi linking problem
This doesn't sound like a linking problem; this sounds like there's an error in your application that is causing it to abort before completing. On Jun 25, 2014, at 12:19 PM, Sergii Veremieiev wrote: > Dear Sir/Madam, > > I'm trying to run a parallel finite element analysis 64-bit code on my > desktop with Windows 7, Cygwin, Open MPI 1.7.5, 64Gb RAM and 6-core Intel > Core i7-3930K CPU via "mpirun -np 6 executable" command. The code runs fine, > but if I increase the number of elements to a critical one (roughly more than > 100k) the built-in Mumps library returns an error message (please see below). > Can you possibly advise me what can be a problem? I have checked in Task > Manager the code is using about 3-6Gb per process or about 20Gb in total, > that is much smaller than the amount of physical memory available on the > system 55Gb. Is there possibly a memory limit in Windows available per > process? Thank you. > > Best regards, > > Sergii > > > mpirun has exited due to process rank 1 with PID 6028 on > node exiting improperly. There are three reasons this could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter > orte_create_session_dirs is set to false. In this case, the run-time cannot > detect that the abort call was an abnormal termination. Hence, the only > error message you will receive is this one. > > This may have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > > You can avoid this message by specifying -quiet on the mpirun command line. > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24703.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Problem mpi
Sounds like you have a problem with the physical layer of your InfiniBand. You should run layer 0 diagnostics and/or contact your IB vendor for assistance. On Jun 24, 2014, at 4:48 AM, Diego Saúl Carrió Carrió wrote: > Dear all, > > I have problems for a long time related with mpirun. When I executed mpirun > (with my program) I obtained the next error after a while: > > . > . > . > . > . > > mlx4: local QP operation err (QPN c00054, WQE index a, vendor syndrome > 6f, opcode = 5e) > [[64826,1],0][btl_openib_component.c:3497:handle_wc] from foner109 to: > foner111 error polling LP CQ with status LOCAL QP OPERATION ERROR status > number 2 for wr_id af58a8 opcode 128 vendor error 111 qp_idx 3 > > mpirun has exited due to process rank 0 with PID 51754 on > node foner109 exiting improperly. There are two reasons this could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > This may have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > > > > I am using a cluster (42 nodes, with 20 processors and 64 Gb RAM for each > one). I want to use for example only 20 nodes, so I put: > > salloc -N20 --tasks-per-node=1 --cpus-per-task=20 -p thin(name of the node) > > mpirun -pernode [my_program] > > > Could you help me to solve this problem? > > Best Regards, > Diego > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24692.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] poor performance using the openib btl
Just curious -- if you run standard ping-pong kinds of MPI benchmarks with the same kind of mpirun command line that you run your application, do you see the expected level of performance? (i.e., verification that you're using the low latency transport, etc.) On Jun 25, 2014, at 9:52 AM, Fischer, Greg A. wrote: > I looked through my configure log, and that option is not enabled. Thanks for > the suggestion. > > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime > Boissonneault > Sent: Wednesday, June 25, 2014 10:51 AM > To: Open MPI Users > Subject: Re: [OMPI users] poor performance using the openib btl > > Hi, > I recovered the name of the option that caused problems for us. It is > --enable-mpi-thread-multiple > > This option enables threading within OPAL, which was bugged (at least in > 1.6.x series). I don't know if it has been fixed in 1.8 series. > > I do not see your configure line in the attached file, to see if it was > enabled or not. > > Maxime > > Le 2014-06-25 10:46, Fischer, Greg A. a écrit : > Attached are the results of “grep thread” on my configure output. There > appears to be some amount of threading, but is there anything I should look > for in particular? > > I see Mike Dubman’s questions on the mailing list website, but his message > didn’t appear to make it to my inbox. The answers to his questions are: > > [binford:fischega] $ rpm -qa | grep ofed > ofed-doc-1.5.4.1-0.11.5 > ofed-kmp-default-1.5.4.1_3.0.76_0.11-0.11.5 > ofed-1.5.4.1-0.11.5 > > Distro: SLES11 SP3 > > HCA: > [binf102:fischega] $ /usr/sbin/ibstat > CA 'mlx4_0' > CA type: MT26428 > > Command line (path and LD_LIBRARY_PATH are set correctly): > mpirun -x LD_LIBRARY_PATH -mca btl openib,sm,self -mca btl_openib_verbose 1 > -np 31 $CTF_EXEC > > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime > Boissonneault > Sent: Tuesday, June 24, 2014 6:41 PM > To: Open MPI Users > Subject: Re: [OMPI users] poor performance using the openib btl > > What are your threading options for OpenMPI (when it was built) ? > > I have seen OpenIB BTL completely lock when some level of threading is > enabled before. > > Maxime Boissonneault > > > Le 2014-06-24 18:18, Fischer, Greg A. a écrit : > Hello openmpi-users, > > A few weeks ago, I posted to the list about difficulties I was having getting > openib to work with Torque (see “openib segfaults with Torque”, June 6, > 2014). The issues were related to Torque imposing restrictive limits on > locked memory, and have since been resolved. > > However, now that I’ve had some time to test the applications, I’m seeing > abysmal performance over the openib layer. Applications run with the tcp btl > execute about 10x faster than with the openib btl. Clearly something still > isn’t quite right. > > I tried running with “-mca btl_openib_verbose 1”, but didn’t see anything > resembling a smoking gun. How should I go about determining the source of the > problem? (This uses the same OpenMPI Version 1.8.1 / SLES11 SP3 / GCC 4.8.3 > setup discussed previously.) > > Thanks, > Greg > > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24697.php > > > > > -- > - > Maxime Boissonneault > Analyste de calcul - Calcul Québec, Université Laval > Ph. D. en physique > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24700.php > > > > -- > - > Maxime Boissonneault > Analyste de calcul - Calcul Québec, Université Laval > Ph. D. en physique > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24702.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] poor performance using the openib btl
You might try restarting the device drivers. $pdsh -g yourcluster service openibd restart Josh Sent from my iPhone > On Jun 26, 2014, at 6:55 AM, "Jeff Squyres (jsquyres)" > wrote: > > Just curious -- if you run standard ping-pong kinds of MPI benchmarks with > the same kind of mpirun command line that you run your application, do you > see the expected level of performance? (i.e., verification that you're using > the low latency transport, etc.) > > >> On Jun 25, 2014, at 9:52 AM, Fischer, Greg A. >> wrote: >> >> I looked through my configure log, and that option is not enabled. Thanks >> for the suggestion. >> >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime >> Boissonneault >> Sent: Wednesday, June 25, 2014 10:51 AM >> To: Open MPI Users >> Subject: Re: [OMPI users] poor performance using the openib btl >> >> Hi, >> I recovered the name of the option that caused problems for us. It is >> --enable-mpi-thread-multiple >> >> This option enables threading within OPAL, which was bugged (at least in >> 1.6.x series). I don't know if it has been fixed in 1.8 series. >> >> I do not see your configure line in the attached file, to see if it was >> enabled or not. >> >> Maxime >> >> Le 2014-06-25 10:46, Fischer, Greg A. a écrit : >> Attached are the results of “grep thread” on my configure output. There >> appears to be some amount of threading, but is there anything I should look >> for in particular? >> >> I see Mike Dubman’s questions on the mailing list website, but his message >> didn’t appear to make it to my inbox. The answers to his questions are: >> >> [binford:fischega] $ rpm -qa | grep ofed >> ofed-doc-1.5.4.1-0.11.5 >> ofed-kmp-default-1.5.4.1_3.0.76_0.11-0.11.5 >> ofed-1.5.4.1-0.11.5 >> >> Distro: SLES11 SP3 >> >> HCA: >> [binf102:fischega] $ /usr/sbin/ibstat >> CA 'mlx4_0' >>CA type: MT26428 >> >> Command line (path and LD_LIBRARY_PATH are set correctly): >> mpirun -x LD_LIBRARY_PATH -mca btl openib,sm,self -mca btl_openib_verbose 1 >> -np 31 $CTF_EXEC >> >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime >> Boissonneault >> Sent: Tuesday, June 24, 2014 6:41 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] poor performance using the openib btl >> >> What are your threading options for OpenMPI (when it was built) ? >> >> I have seen OpenIB BTL completely lock when some level of threading is >> enabled before. >> >> Maxime Boissonneault >> >> >> Le 2014-06-24 18:18, Fischer, Greg A. a écrit : >> Hello openmpi-users, >> >> A few weeks ago, I posted to the list about difficulties I was having >> getting openib to work with Torque (see “openib segfaults with Torque”, June >> 6, 2014). The issues were related to Torque imposing restrictive limits on >> locked memory, and have since been resolved. >> >> However, now that I’ve had some time to test the applications, I’m seeing >> abysmal performance over the openib layer. Applications run with the tcp btl >> execute about 10x faster than with the openib btl. Clearly something still >> isn’t quite right. >> >> I tried running with “-mca btl_openib_verbose 1”, but didn’t see anything >> resembling a smoking gun. How should I go about determining the source of >> the problem? (This uses the same OpenMPI Version 1.8.1 / SLES11 SP3 / GCC >> 4.8.3 setup discussed previously.) >> >> Thanks, >> Greg >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/06/24697.php >> >> >> >> >> -- >> - >> Maxime Boissonneault >> Analyste de calcul - Calcul Québec, Université Laval >> Ph. D. en physique >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/06/24700.php >> >> >> >> -- >> - >> Maxime Boissonneault >> Analyste de calcul - Calcul Québec, Université Laval >> Ph. D. en physique >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/06/24702.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24707.php
[OMPI users] Compiling OpenMPI for Intel Xeon Phi/MIC
I'm currently working towards setting up a single node system with a xeon phi card. We have intel compilers (v.13.1.3) installed and I was able to get the standard openmpi (v1.6.5) to install. Right now, I am just hoping to run codes natively on the xeon phi. When trying to compile a hello world code via "mpicc -mmic hello.c" it results in the error: x86_64-k1om-linux-ld: skipping incompatible /ssd/apps/openmpi-intel/lib/libmpi.so when searching for -lmpi x86_64-k1om-linux-ld: cannot find -lmpi I'm guessing this is due to not having compiled openmpi with the "-mmic" option. Although, attempting to configure openmpi with -mmic will fail instantly as the configure attempts to test basic codes with "-mmic" on the host processor. In a couple of threads it was mentioned that people have been able to get this to work, but not much detail on how they built openmpi to achieve this. Any help is appreciated. -Adam
Re: [OMPI users] Compiling OpenMPI for Intel Xeon Phi/MIC
I'm on the road today, but will be back tomorrow afternoon (US Pacific time) and can forward my notes on this again. In the interim, just go to our user mailing list archives and search for "phi" and you'll see the conversations. Basically, you have to cross-compile OMPI to run on the Phi. I've been intending to post the detailed steps on our FAQ, but just haven't gotten around to it - my bad. Ralph On Thu, Jun 26, 2014 at 3:31 PM, Adam Jundt wrote: > I'm currently working towards setting up a single node system with a > xeon phi card. We have intel compilers (v.13.1.3) installed and I was > able to get the standard openmpi (v1.6.5) to install. > > Right now, I am just hoping to run codes natively on the xeon phi. > When trying to compile a hello world code via "mpicc -mmic hello.c" it > results in the error: > > x86_64-k1om-linux-ld: skipping incompatible > /ssd/apps/openmpi-intel/lib/libmpi.so when searching for -lmpi > x86_64-k1om-linux-ld: cannot find -lmpi > > > I'm guessing this is due to not having compiled openmpi with the > "-mmic" option. Although, attempting to configure openmpi with -mmic > will fail instantly as the configure attempts to test basic codes with > "-mmic" on the host processor. > > In a couple of threads it was mentioned that people have been able to > get this to work, but not much detail on how they built openmpi to > achieve this. Any help is appreciated. > > -Adam > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24709.php >
Re: [OMPI users] Compiling OpenMPI for Intel Xeon Phi/MIC
Here's what I used to build 1.8.1 with Intel 13.5 recently: module load compiler/13.5.192 export PATH=/usr/linux-k1om-4.7/bin/:$PATH ../configure --prefix=/path/to/your/ompi/install \ CC="icc -mmic" CXX="icpc -mmic" \ --build=x86_64-unknown-linux-gnu --host=x86_64-k1om-linux \ AR=x86_64-k1om-linux-ar RANLIB=x86_64-k1om-linux-ranlib \ LD=x86_64-k1om-linux-ld \ --enable-mpirun-prefix-by-default --disable-io-romio \ --disable-vt --disable-mpi-fortran \ --enable-mca-no-build=btl-usnic,btl-openib,common-verbs make make install Problem with this is that you get a mpicc that must run on MIC, while you want a host-mpicc that generates MIC code. $ ssh mic0 $ $ /path/to/your/ompi/install/bin/mpicc --show icc -mmic -I/home/goglin/mic/openmpi-1.7.4/build-mic/install/include \ -pthread -Wl,-rpath -Wl,/home/goglin/mic/openmpi-1.7.4/build-mic/install/lib \ -Wl,--enable-new-dtags -L/home/goglin/mic/openmpi-1.7.4/build-mic/install/lib \ -lmpi Now use the above line as mpicc on the host to build for the MIC. But I had to append this: -Wl,-rpath -Wl,/opt/cluster/intel/composer_xe_2013.5.192/compiler/lib/mic/ \ -L/opt/cluster/intel/composer_xe_2013.5.192/compiler/lib/mic/ I hoped WRAPPER_* configure variables would help solving all this, but I couldn't make them work. Brice Le 26/06/2014 22:31, Adam Jundt a écrit : > I'm currently working towards setting up a single node system with a > xeon phi card. We have intel compilers (v.13.1.3) installed and I was > able to get the standard openmpi (v1.6.5) to install. > > Right now, I am just hoping to run codes natively on the xeon phi. > When trying to compile a hello world code via "mpicc -mmic hello.c" it > results in the error: > > x86_64-k1om-linux-ld: skipping incompatible > /ssd/apps/openmpi-intel/lib/libmpi.so when searching for -lmpi > x86_64-k1om-linux-ld: cannot find -lmpi > > > I'm guessing this is due to not having compiled openmpi with the > "-mmic" option. Although, attempting to configure openmpi with -mmic > will fail instantly as the configure attempts to test basic codes with > "-mmic" on the host processor. > > In a couple of threads it was mentioned that people have been able to > get this to work, but not much detail on how they built openmpi to > achieve this. Any help is appreciated. > > -Adam > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24709.php