[OMPI users] Problems with mpicc-wrapper-data.txt
I get the follwing error (it is more like a waring, the mpicc produce output): [olews@login-0-1 $ /site/VERSIONS/openmpi-1.4.3.intel.test/bin/mpicc [login-0-1.local:14689] keyval parser: error 1 reading file /site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt at line 1: # There can be multiple blocks of configuration data, chosen by gcc: no input files The /site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt is read, verified by chaning it and noticing it's effect. It works fint, but many users are quite unhappy wit this error. I have used strace to see that all the characters get read (322 from strace and 322 from wc). It looks like there is something internal in the executable Is there a fix for apparently bug ? I searched the mailing list, but most information I got was of the type configure/make clean/make/make install and this is something I have tried before. Background : We have several installations of OpenMPI installed. They reside at (showing mpicc location) : /site/VERSIONS/openmpi-1.2.8.gnu/bin/mpicc /site/VERSIONS/openmpi-1.2.8.intel/bin/mpicc /site/VERSIONS/openmpi-1.3.3.gnu/bin/mpicc /site/VERSIONS/openmpi-1.3.3.intel/bin/mpicc /site/VERSIONS/openmpi-1.3.3.intel.ipath/bin/mpicc /site/VERSIONS/openmpi-1.3.3.pgi/bin/mpicc /site/VERSIONS/openmpi-1.4.1.gnu/bin/mpicc /site/VERSIONS/openmpi-1.4.1.intel/bin/mpicc /site/VERSIONS/openmpi-1.4.2.intel/bin/mpicc /site/VERSIONS/openmpi-1.4.3.gnu/bin/mpicc /site/VERSIONS/openmpi-1.4.3.gnu32/bin/mpicc /site/VERSIONS/openmpi-1.4.3.intel/bin/mpicc /site/VERSIONS/openmpi-1.4.3.intel.test/bin/mpicc /site/VERSIONS/openmpi-1.4.3.open64/bin/mpicc /site/VERSIONS/openmpi-1.4.3.pgi/bin/mpicc /site/VERSIONS/openmpi-1.4.intel/bin/mpicc /site/VERSIONS/openmpi-1.4.intel.icc/bin/mpicc With corresponding modules to set up the correct path and library path. set modulefile [lrange [split [module-info name] {/}] 0 0] set apphome/site/VERSIONS/openmpi-1.4.3.intel.test set appnameOpenMPI set appurl http://www.open-mpi.org module-whatis "A High Performance Message Passing Library" setenv MPI_TYPE openmpi prepend-pathPATH$apphome/bin prepend-pathLD_LIBRARY_PATH $apphome/lib prepend-pathLD_LIBRARY_PATH $apphome/lib/openmpi prepend-pathMANPATH $apphome/share/man -- Ole W. Saastad, dr. scient. Scientific Computing Group, USIT, University of Oslo http://hpc.uio.no
[OMPI users] Fatal error while running the code
Hello everyone, I am newbie here. I am running the code for Large eddy simulation of turbulent flow. I am compiling the code using wrapper command and running the code on Hydra cluster. when I am submitting the script file it is showing the following error. running mpdallexit on hydra127 LAUNCHED mpd on hydra127 via RUNNING: mpd on hydra127 LAUNCHED mpd on hydra118 via hydra127 RUNNING: mpd on hydra118 Fatal error in MPI_Send: Invalid rank, error stack: MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1, MPI_DOUBLE_PRECISION, dest=1, tag=1, MPI_COMM_WORLD) failed MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less than 1 Total Nb of PE:1 PE# 0 / 1 OK PE# 00 0 0 PE# 00 33 0 165 0 33 PE# 0 -1 1 -1 -1 -1 8 PE_Table, PE# 0 complete PE# 0 -0.03 0.98 -1.00 1.00 -0.03 0.98 PE# 0 doesn t intersect any bloc PE# 0 will communicate with0 single value PE# 0 has 2 com. boundaries Data_Read, PE# 0 complete PE# 0 checking boundary type for 0 1 1 1 0 165 0 33 nor sur sur sur gra 1 0 0 0 2 33 33 0 165 0 33EXC -> 1 0 3 0 33 1 1 0 33 sur nor sur sur gra 0 1 0 0 4 0 33 164 164 0 33 sur nor sur sur gra 0 -1 0 0 5 0 33 0 165 1 1 cyc cyc cyc sur cyc 0 0 1 0 6 0 33 0 165 33 33EXC -> 8 PE# 0 Set new PE# 0 FFT Table PE# 0 Coeff rank 0 in job 1 hydra127_34565 caused collective abort of all ranks exit status of rank 0: return code 1 I am struggling to find the error in my code. can anybody suggest me where I messed up. Thanks and Regards, Ash
Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?
I'm using openmpi 1.4.3. The cluster consist of two desktop with Intel core 2 duo running on Ubuntu 10.04. A weird thing that i found is that when I issued the command "env | grep LD_LIBRARY_PATH" on the slave node, it showed the mpi lib path. But when I issude the command "ssh slave-node env | grep LD_LIBRARY_PATH" on the master side to check the LD_LIBRARY_PATH of the slave node, it showed nothing. Also, issuing the command "ssh master-node env | grep LD_LIBRARY_PATH" on the slave side would return the correct mpi lib path. I tried to modify the .bashrc and the /etc/ld.so.conf.d/*.conf file to configure the LD_LIBRARY_PATH on the slave node, but it seems to work only locally. How can I set the LD_LIBRARY_PATH on the slave node side, so that I can get the correct path when I use "ssh slave-node env | grep LD_LIBRARY_PATH" on the master side? Kong On Wed, Feb 23, 2011 at 5:22 PM, Bill Rankin wrote: > Jeff: > >> FWIW: I have rarely seen this to be the issue. > > Been bitten by similar situations before. But it may not have been OpenMPI. > In any case it was a while back. > >> In short, programs are erroneous that do not guarantee that all their >> outstanding requests have completed before calling finalize. > > Agreed 100%. The barrier won't prevent the case of unmatched sends/receives > or outstanding request handles, but if the logic is correct it does make sure > that everyone completes before anyone leaves. > > In any case, I also tried code #2 and it completed w/o issue on our cluster. > I guess the next thing to ask Kong is regarding what version he is running > and what is the platform. > > -b > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Xianglong Kong Department of Mechanical Engineering University of Rochester Phone: (585)520-4412 MSN: dinosaur8...@hotmail.com
Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?
Ensure to check that a) your .bashrc is actually executing when you "ssh othernode env", and b) if .bashrc is executing, make sure that it isn't prematurely exiting for non-interactive jobs. On Feb 25, 2011, at 9:58 AM, Xianglong Kong wrote: > I'm using openmpi 1.4.3. The cluster consist of two desktop with Intel > core 2 duo running on Ubuntu 10.04. > > A weird thing that i found is that when I issued the command "env | > grep LD_LIBRARY_PATH" on the slave node, it showed the mpi lib path. > But when > I issude the command "ssh slave-node env | grep LD_LIBRARY_PATH" on > the master side to check the LD_LIBRARY_PATH of the slave node, it > showed nothing. Also, issuing the command "ssh master-node env | grep > LD_LIBRARY_PATH" on the slave side would return the correct mpi lib > path. > > I tried to modify the .bashrc and the /etc/ld.so.conf.d/*.conf file to > configure the LD_LIBRARY_PATH on the slave node, but it seems to work > only locally. How can I set the LD_LIBRARY_PATH on the slave node > side, so that I can get the correct path when I use "ssh slave-node > env | grep LD_LIBRARY_PATH" on the master side? > > Kong > > On Wed, Feb 23, 2011 at 5:22 PM, Bill Rankin wrote: >> Jeff: >> >>> FWIW: I have rarely seen this to be the issue. >> >> Been bitten by similar situations before. But it may not have been OpenMPI. >> In any case it was a while back. >> >>> In short, programs are erroneous that do not guarantee that all their >>> outstanding requests have completed before calling finalize. >> >> Agreed 100%. The barrier won't prevent the case of unmatched sends/receives >> or outstanding request handles, but if the logic is correct it does make >> sure that everyone completes before anyone leaves. >> >> In any case, I also tried code #2 and it completed w/o issue on our cluster. >> I guess the next thing to ask Kong is regarding what version he is running >> and what is the platform. >> >> -b >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Xianglong Kong > Department of Mechanical Engineering > University of Rochester > Phone: (585)520-4412 > MSN: dinosaur8...@hotmail.com > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Fatal error while running the code
Two things: 1. It looks like you are using the MPICH implementation of MPI. You should probably ping them on their email list -- this list is for the Open MPI implementation of MPI (a wholly different code base than MPICH; sorry!). 2. The error code seems quite descriptive, actually: > MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1, MPI_DOUBLE_PRECISION, > dest=1, tag=1, MPI_COMM_WORLD) failed > MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less than > 1 You sent dest=1, but the apparently the communicator must be of size 1, meaning that the only possible destination is 0 (i.e., yourself). On Feb 25, 2011, at 9:23 AM, Ashwinkumar Dobariya wrote: > Hello everyone, > > I am newbie here. I am running the code for Large eddy simulation of > turbulent flow. I am compiling the code using wrapper command and running the > code on Hydra cluster. when I am submitting the script file it is showing the > following error. > > running mpdallexit on hydra127 > LAUNCHED mpd on hydra127 via > RUNNING: mpd on hydra127 > LAUNCHED mpd on hydra118 via hydra127 > RUNNING: mpd on hydra118 > Fatal error in MPI_Send: Invalid rank, error stack: > MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1, MPI_DOUBLE_PRECISION, > dest=1, tag=1, MPI_COMM_WORLD) failed > MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less than > 1 > Total Nb of PE:1 > > PE# 0 / 1 OK > PE# 00 0 0 > PE# 00 33 0 165 0 33 > PE# 0 -1 1 -1 -1 -1 8 > PE_Table, PE# 0 complete > PE# 0 -0.03 0.98 -1.00 1.00 -0.03 0.98 > PE# 0 doesn t intersect any bloc > PE# 0 will communicate with0 > single value > PE# 0 has 2 com. boundaries > Data_Read, PE# 0 complete > > PE# 0 checking boundary type for > 0 1 1 1 0 165 0 33 nor sur sur sur gra 1 0 0 > 0 2 33 33 0 165 0 33EXC -> 1 > 0 3 0 33 1 1 0 33 sur nor sur sur gra 0 1 0 > 0 4 0 33 164 164 0 33 sur nor sur sur gra 0 -1 0 > 0 5 0 33 0 165 1 1 cyc cyc cyc sur cyc 0 0 1 > 0 6 0 33 0 165 33 33EXC -> 8 > PE# 0 Set new > PE# 0 FFT Table > PE# 0 Coeff > rank 0 in job 1 hydra127_34565 caused collective abort of all ranks > exit status of rank 0: return code 1 > > I am struggling to find the error in my code. can anybody suggest me where I > messed up. > > Thanks and Regards, > Ash > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Problems with mpicc-wrapper-data.txt
Can you send the entire contents of /site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt? On Feb 25, 2011, at 9:21 AM, Ole Widar Saastad wrote: > I get the follwing error (it is more like a waring, the mpicc produce > output): > [olews@login-0-1 $ /site/VERSIONS/openmpi-1.4.3.intel.test/bin/mpicc > [login-0-1.local:14689] keyval parser: error 1 reading file > /site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt > at line 1: > # There can be multiple blocks of configuration data, chosen by > gcc: no input files > > > The > /site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt > is read, verified by chaning it and noticing it's effect. It works fint, but > many users are quite unhappy wit this error. I have used strace to see that > all the characters get read (322 from strace and 322 from wc). > It looks like there is something internal in the executable > > Is there a fix for apparently bug ? I searched the mailing list, but > most information I got was of the type configure/make clean/make/make > install and this is something I have tried before. > > > > Background : > > We have several installations of OpenMPI installed. > > They reside at (showing mpicc location) : > > /site/VERSIONS/openmpi-1.2.8.gnu/bin/mpicc > /site/VERSIONS/openmpi-1.2.8.intel/bin/mpicc > /site/VERSIONS/openmpi-1.3.3.gnu/bin/mpicc > /site/VERSIONS/openmpi-1.3.3.intel/bin/mpicc > /site/VERSIONS/openmpi-1.3.3.intel.ipath/bin/mpicc > /site/VERSIONS/openmpi-1.3.3.pgi/bin/mpicc > /site/VERSIONS/openmpi-1.4.1.gnu/bin/mpicc > /site/VERSIONS/openmpi-1.4.1.intel/bin/mpicc > /site/VERSIONS/openmpi-1.4.2.intel/bin/mpicc > /site/VERSIONS/openmpi-1.4.3.gnu/bin/mpicc > /site/VERSIONS/openmpi-1.4.3.gnu32/bin/mpicc > /site/VERSIONS/openmpi-1.4.3.intel/bin/mpicc > /site/VERSIONS/openmpi-1.4.3.intel.test/bin/mpicc > /site/VERSIONS/openmpi-1.4.3.open64/bin/mpicc > /site/VERSIONS/openmpi-1.4.3.pgi/bin/mpicc > /site/VERSIONS/openmpi-1.4.intel/bin/mpicc > /site/VERSIONS/openmpi-1.4.intel.icc/bin/mpicc > > With corresponding modules to set up the correct path and library path. > set modulefile [lrange [split [module-info name] {/}] 0 0] > set apphome/site/VERSIONS/openmpi-1.4.3.intel.test > set appnameOpenMPI > set appurl http://www.open-mpi.org > > module-whatis "A High Performance Message Passing Library" > > setenv MPI_TYPE openmpi > > prepend-pathPATH$apphome/bin > prepend-pathLD_LIBRARY_PATH $apphome/lib > prepend-pathLD_LIBRARY_PATH $apphome/lib/openmpi > prepend-pathMANPATH $apphome/share/man > > > > -- > Ole W. Saastad, dr. scient. > Scientific Computing Group, USIT, University of Oslo > http://hpc.uio.no > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?
.bashrc is not executed when I ssh the node. How can I let it be executed? Kong On Fri, Feb 25, 2011 at 10:04 AM, Jeff Squyres wrote: > Ensure to check that a) your .bashrc is actually executing when you "ssh > othernode env", and b) if .bashrc is executing, make sure that it isn't > prematurely exiting for non-interactive jobs. > > > On Feb 25, 2011, at 9:58 AM, Xianglong Kong wrote: > >> I'm using openmpi 1.4.3. The cluster consist of two desktop with Intel >> core 2 duo running on Ubuntu 10.04. >> >> A weird thing that i found is that when I issued the command "env | >> grep LD_LIBRARY_PATH" on the slave node, it showed the mpi lib path. >> But when >> I issude the command "ssh slave-node env | grep LD_LIBRARY_PATH" on >> the master side to check the LD_LIBRARY_PATH of the slave node, it >> showed nothing. Also, issuing the command "ssh master-node env | grep >> LD_LIBRARY_PATH" on the slave side would return the correct mpi lib >> path. >> >> I tried to modify the .bashrc and the /etc/ld.so.conf.d/*.conf file to >> configure the LD_LIBRARY_PATH on the slave node, but it seems to work >> only locally. How can I set the LD_LIBRARY_PATH on the slave node >> side, so that I can get the correct path when I use "ssh slave-node >> env | grep LD_LIBRARY_PATH" on the master side? >> >> Kong >> >> On Wed, Feb 23, 2011 at 5:22 PM, Bill Rankin wrote: >>> Jeff: >>> FWIW: I have rarely seen this to be the issue. >>> >>> Been bitten by similar situations before. But it may not have been >>> OpenMPI. In any case it was a while back. >>> In short, programs are erroneous that do not guarantee that all their outstanding requests have completed before calling finalize. >>> >>> Agreed 100%. The barrier won't prevent the case of unmatched >>> sends/receives or outstanding request handles, but if the logic is correct >>> it does make sure that everyone completes before anyone leaves. >>> >>> In any case, I also tried code #2 and it completed w/o issue on our >>> cluster. I guess the next thing to ask Kong is regarding what version he >>> is running and what is the platform. >>> >>> -b >>> >>> >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> Xianglong Kong >> Department of Mechanical Engineering >> University of Rochester >> Phone: (585)520-4412 >> MSN: dinosaur8...@hotmail.com >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Xianglong Kong Department of Mechanical Engineering University of Rochester Phone: (585)520-4412 MSN: dinosaur8...@hotmail.com
Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?
Have a look at the bash man page, and these two FAQ items: http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path http://www.open-mpi.org/faq/?category=running#mpirun-prefix On Feb 25, 2011, at 10:31 AM, Xianglong Kong wrote: > .bashrc is not executed when I ssh the node. How can I let it be executed? > > Kong > > On Fri, Feb 25, 2011 at 10:04 AM, Jeff Squyres wrote: >> Ensure to check that a) your .bashrc is actually executing when you "ssh >> othernode env", and b) if .bashrc is executing, make sure that it isn't >> prematurely exiting for non-interactive jobs. >> >> >> On Feb 25, 2011, at 9:58 AM, Xianglong Kong wrote: >> >>> I'm using openmpi 1.4.3. The cluster consist of two desktop with Intel >>> core 2 duo running on Ubuntu 10.04. >>> >>> A weird thing that i found is that when I issued the command "env | >>> grep LD_LIBRARY_PATH" on the slave node, it showed the mpi lib path. >>> But when >>> I issude the command "ssh slave-node env | grep LD_LIBRARY_PATH" on >>> the master side to check the LD_LIBRARY_PATH of the slave node, it >>> showed nothing. Also, issuing the command "ssh master-node env | grep >>> LD_LIBRARY_PATH" on the slave side would return the correct mpi lib >>> path. >>> >>> I tried to modify the .bashrc and the /etc/ld.so.conf.d/*.conf file to >>> configure the LD_LIBRARY_PATH on the slave node, but it seems to work >>> only locally. How can I set the LD_LIBRARY_PATH on the slave node >>> side, so that I can get the correct path when I use "ssh slave-node >>> env | grep LD_LIBRARY_PATH" on the master side? >>> >>> Kong >>> >>> On Wed, Feb 23, 2011 at 5:22 PM, Bill Rankin wrote: Jeff: > FWIW: I have rarely seen this to be the issue. Been bitten by similar situations before. But it may not have been OpenMPI. In any case it was a while back. > In short, programs are erroneous that do not guarantee that all their > outstanding requests have completed before calling finalize. Agreed 100%. The barrier won't prevent the case of unmatched sends/receives or outstanding request handles, but if the logic is correct it does make sure that everyone completes before anyone leaves. In any case, I also tried code #2 and it completed w/o issue on our cluster. I guess the next thing to ask Kong is regarding what version he is running and what is the platform. -b ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> -- >>> Xianglong Kong >>> Department of Mechanical Engineering >>> University of Rochester >>> Phone: (585)520-4412 >>> MSN: dinosaur8...@hotmail.com >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Xianglong Kong > Department of Mechanical Engineering > University of Rochester > Phone: (585)520-4412 > MSN: dinosaur8...@hotmail.com > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"
Dear Josh, Did you find out the problem? I still cannot progress anything. Hope to hear some good news from you. Regards, Nguyen Toan On Sun, Feb 13, 2011 at 3:04 PM, Nguyen Toan wrote: > Hi Josh, > > I tried the MCA parameter you mentioned but it did not help, the unknown > overhead still exists. > Here I attach the output of 'ompi_info', both version 1.5 and 1.5.1. > Hope you can find out the problem. > Thank you. > > Regards, > Nguyen Toan > > On Wed, Feb 9, 2011 at 11:08 PM, Joshua Hursey wrote: > >> It looks like the logic in the configure script is turning on the FT >> thread for you when you specify both '--with-ft=cr' and >> '--enable-mpi-threads'. >> >> Can you send me the output of 'ompi_info'? Can you also try the MCA >> parameter that I mentioned earlier to see if that changes the performance? >> >> I there are many non-blocking sends and receives, there might be >> performance bug with the way the point-to-point wrapper is tracking request >> objects. If the above MCA parameter does not help the situation, let me know >> and I might be able to take a look at this next week. >> >> Thanks, >> Josh >> >> On Feb 9, 2011, at 1:40 AM, Nguyen Toan wrote: >> >> > Hi Josh, >> > Thanks for the reply. I did not use the '--enable-ft-thread' option. >> Here is my build options: >> > >> > CFLAGS=-g \ >> > ./configure \ >> > --with-ft=cr \ >> > --enable-mpi-threads \ >> > --with-blcr=/home/nguyen/opt/blcr \ >> > --with-blcr-libdir=/home/nguyen/opt/blcr/lib \ >> > --prefix=/home/nguyen/opt/openmpi \ >> > --with-openib \ >> > --enable-mpirun-prefix-by-default >> > >> > My application requires lots of communication in every loop, focusing on >> MPI_Isend, MPI_Irecv and MPI_Wait. Also I want to make only one checkpoint >> per application execution for my purpose, but the unknown overhead exists >> even when no checkpoint was taken. >> > >> > Do you have any other idea? >> > >> > Regards, >> > Nguyen Toan >> > >> > >> > On Wed, Feb 9, 2011 at 12:41 AM, Joshua Hursey >> wrote: >> > There are a few reasons why this might be occurring. Did you build with >> the '--enable-ft-thread' option? >> > >> > If so, it looks like I didn't move over the thread_sleep_wait adjustment >> from the trunk - the thread was being a bit too aggressive. Try adding the >> following to your command line options, and see if it changes the >> performance. >> > "-mca opal_cr_thread_sleep_wait 1000" >> > >> > There are other places to look as well depending on how frequently your >> application communicates, how often you checkpoint, process layout, ... But >> usually the aggressive nature of the thread is the main problem. >> > >> > Let me know if that helps. >> > >> > -- Josh >> > >> > On Feb 8, 2011, at 2:50 AM, Nguyen Toan wrote: >> > >> > > Hi all, >> > > >> > > I am using the latest version of OpenMPI (1.5.1) and BLCR (0.8.2). >> > > I found that when running an application,which uses MPI_Isend, >> MPI_Irecv and MPI_Wait, >> > > enabling C/R, i.e using "-am ft-enable-cr", the application runtime is >> much longer than the normal execution with mpirun (no checkpoint was taken). >> > > This overhead becomes larger when the normal execution runtime is >> longer. >> > > Does anybody have any idea about this overhead, and how to eliminate >> it? >> > > Thanks. >> > > >> > > Regards, >> > > Nguyen >> > > ___ >> > > users mailing list >> > > us...@open-mpi.org >> > > http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> > >> > Joshua Hursey >> > Postdoctoral Research Associate >> > Oak Ridge National Laboratory >> > http://users.nccs.gov/~jjhursey >> > >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> Joshua Hursey >> Postdoctoral Research Associate >> Oak Ridge National Laboratory >> http://users.nccs.gov/~jjhursey >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >