[OMPI users] Problems with mpicc-wrapper-data.txt

2011-02-25 Thread Ole Widar Saastad
I get the follwing error (it is more like a waring, the mpicc produce
output):
[olews@login-0-1 $ /site/VERSIONS/openmpi-1.4.3.intel.test/bin/mpicc
[login-0-1.local:14689] keyval parser: error 1 reading file 
/site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt at 
line 1:
  # There can be multiple blocks of configuration data, chosen by
gcc: no input files


The 
/site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt is 
read, verified by chaning it and noticing it's effect. It works fint, but many 
users are quite unhappy wit this error. I have used strace to see that all the 
characters get read (322 from strace and 322 from wc).
It looks like there is something internal in the executable

Is there a fix for apparently bug ? I searched the mailing list, but
most information I got was of the type configure/make clean/make/make
install and this is something I have tried before.



Background :

We have several installations of OpenMPI installed.

They reside at (showing mpicc location) :

/site/VERSIONS/openmpi-1.2.8.gnu/bin/mpicc
/site/VERSIONS/openmpi-1.2.8.intel/bin/mpicc
/site/VERSIONS/openmpi-1.3.3.gnu/bin/mpicc
/site/VERSIONS/openmpi-1.3.3.intel/bin/mpicc
/site/VERSIONS/openmpi-1.3.3.intel.ipath/bin/mpicc
/site/VERSIONS/openmpi-1.3.3.pgi/bin/mpicc
/site/VERSIONS/openmpi-1.4.1.gnu/bin/mpicc
/site/VERSIONS/openmpi-1.4.1.intel/bin/mpicc
/site/VERSIONS/openmpi-1.4.2.intel/bin/mpicc
/site/VERSIONS/openmpi-1.4.3.gnu/bin/mpicc
/site/VERSIONS/openmpi-1.4.3.gnu32/bin/mpicc
/site/VERSIONS/openmpi-1.4.3.intel/bin/mpicc
/site/VERSIONS/openmpi-1.4.3.intel.test/bin/mpicc
/site/VERSIONS/openmpi-1.4.3.open64/bin/mpicc
/site/VERSIONS/openmpi-1.4.3.pgi/bin/mpicc
/site/VERSIONS/openmpi-1.4.intel/bin/mpicc
/site/VERSIONS/openmpi-1.4.intel.icc/bin/mpicc

With corresponding modules to set up the correct path and library path.
set modulefile [lrange [split [module-info name] {/}] 0 0]
set apphome/site/VERSIONS/openmpi-1.4.3.intel.test
set appnameOpenMPI
set appurl http://www.open-mpi.org

module-whatis   "A High Performance Message Passing Library"

setenv  MPI_TYPE  openmpi

prepend-pathPATH$apphome/bin
prepend-pathLD_LIBRARY_PATH $apphome/lib
prepend-pathLD_LIBRARY_PATH $apphome/lib/openmpi
prepend-pathMANPATH $apphome/share/man



-- 
Ole W. Saastad, dr. scient.
Scientific Computing Group, USIT, University of Oslo
http://hpc.uio.no



[OMPI users] Fatal error while running the code

2011-02-25 Thread Ashwinkumar Dobariya
Hello everyone,

I am newbie here. I am running the code for Large eddy simulation of
turbulent flow. I am compiling the code using wrapper command and running
the code on Hydra cluster. when I am submitting the script file it is
showing the following error.

running mpdallexit on hydra127
LAUNCHED mpd on hydra127  via
RUNNING: mpd on hydra127
LAUNCHED mpd on hydra118  via  hydra127
RUNNING: mpd on hydra118
Fatal error in MPI_Send: Invalid rank, error stack:
MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1, MPI_DOUBLE_PRECISION,
dest=1, tag=1, MPI_COMM_WORLD) failed
MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less
than 1
 Total Nb of PE:1

 PE#   0 /   1  OK
PE# 00   0   0
PE# 00  33   0 165   0  33
PE# 0  -1  1 -1 -1 -1  8
 PE_Table, PE#   0  complete
PE# 0   -0.03   0.98  -1.00   1.00  -0.03   0.98
 PE#   0  doesn t intersect any bloc
 PE#   0  will communicate with0
 single value
 PE#   0  has   2  com. boundaries
 Data_Read, PE#   0  complete

 PE#   0  checking boundary type for
 0  1   1   1   0 165   0  33  nor sur sur sur gra  1  0  0
 0  2  33  33   0 165   0  33EXC ->  1
 0  3   0  33   1   1   0  33  sur nor sur sur gra  0  1  0
 0  4   0  33 164 164   0  33  sur nor sur sur gra  0 -1  0
 0  5   0  33   0 165   1   1  cyc cyc cyc sur cyc  0  0  1
 0  6   0  33   0 165  33  33EXC ->  8
 PE#   0  Set new
 PE#   0  FFT Table
 PE#   0  Coeff
rank 0 in job 1  hydra127_34565   caused collective abort of all ranks
  exit status of rank 0: return code 1

I am struggling to find the error in my code. can anybody suggest me where I
messed up.

Thanks and Regards,
Ash


Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?

2011-02-25 Thread Xianglong Kong
I'm using openmpi 1.4.3. The cluster consist of two desktop with Intel
core 2 duo running on Ubuntu 10.04.

A weird thing that i found is that when I issued the command "env |
grep LD_LIBRARY_PATH" on the slave node, it showed the mpi lib path.
But when
I issude the command "ssh slave-node env | grep LD_LIBRARY_PATH" on
the master side to check the LD_LIBRARY_PATH of the slave node, it
showed nothing. Also, issuing the command "ssh master-node env | grep
LD_LIBRARY_PATH" on the slave side would return the correct mpi lib
path.

I tried to modify the .bashrc and the /etc/ld.so.conf.d/*.conf file to
configure the LD_LIBRARY_PATH on the slave node, but it seems to work
only locally. How can I set the LD_LIBRARY_PATH on the slave node
side, so that I can get the correct path when I use "ssh slave-node
env | grep LD_LIBRARY_PATH" on the master side?

Kong

On Wed, Feb 23, 2011 at 5:22 PM, Bill Rankin  wrote:
> Jeff:
>
>> FWIW: I have rarely seen this to be the issue.
>
> Been bitten by similar situations before.  But it may not have been OpenMPI.  
> In any case it was a while back.
>
>> In short, programs are erroneous that do not guarantee that all their
>> outstanding requests have completed before calling finalize.
>
> Agreed 100%.  The barrier won't prevent the case of unmatched sends/receives 
> or outstanding request handles, but if the logic is correct it does make sure 
> that everyone completes before anyone leaves.
>
> In any case, I also tried code #2 and it completed w/o issue on our cluster.  
> I guess the next thing to ask Kong is regarding what version he is running 
> and what is the platform.
>
> -b
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Xianglong Kong
Department of Mechanical Engineering
University of Rochester
Phone: (585)520-4412
MSN: dinosaur8...@hotmail.com



Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?

2011-02-25 Thread Jeff Squyres
Ensure to check that a) your .bashrc is actually executing when you "ssh 
othernode env", and b) if .bashrc is executing, make sure that it isn't 
prematurely exiting for non-interactive jobs.


On Feb 25, 2011, at 9:58 AM, Xianglong Kong wrote:

> I'm using openmpi 1.4.3. The cluster consist of two desktop with Intel
> core 2 duo running on Ubuntu 10.04.
> 
> A weird thing that i found is that when I issued the command "env |
> grep LD_LIBRARY_PATH" on the slave node, it showed the mpi lib path.
> But when
> I issude the command "ssh slave-node env | grep LD_LIBRARY_PATH" on
> the master side to check the LD_LIBRARY_PATH of the slave node, it
> showed nothing. Also, issuing the command "ssh master-node env | grep
> LD_LIBRARY_PATH" on the slave side would return the correct mpi lib
> path.
> 
> I tried to modify the .bashrc and the /etc/ld.so.conf.d/*.conf file to
> configure the LD_LIBRARY_PATH on the slave node, but it seems to work
> only locally. How can I set the LD_LIBRARY_PATH on the slave node
> side, so that I can get the correct path when I use "ssh slave-node
> env | grep LD_LIBRARY_PATH" on the master side?
> 
> Kong
> 
> On Wed, Feb 23, 2011 at 5:22 PM, Bill Rankin  wrote:
>> Jeff:
>> 
>>> FWIW: I have rarely seen this to be the issue.
>> 
>> Been bitten by similar situations before.  But it may not have been OpenMPI. 
>>  In any case it was a while back.
>> 
>>> In short, programs are erroneous that do not guarantee that all their
>>> outstanding requests have completed before calling finalize.
>> 
>> Agreed 100%.  The barrier won't prevent the case of unmatched sends/receives 
>> or outstanding request handles, but if the logic is correct it does make 
>> sure that everyone completes before anyone leaves.
>> 
>> In any case, I also tried code #2 and it completed w/o issue on our cluster. 
>>  I guess the next thing to ask Kong is regarding what version he is running 
>> and what is the platform.
>> 
>> -b
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> 
> -- 
> Xianglong Kong
> Department of Mechanical Engineering
> University of Rochester
> Phone: (585)520-4412
> MSN: dinosaur8...@hotmail.com
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Fatal error while running the code

2011-02-25 Thread Jeff Squyres
Two things:

1. It looks like you are using the MPICH implementation of MPI.  You should 
probably ping them on their email list -- this list is for the Open MPI 
implementation of MPI (a wholly different code base than MPICH; sorry!).

2. The error code seems quite descriptive, actually:

> MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1, MPI_DOUBLE_PRECISION, 
> dest=1, tag=1, MPI_COMM_WORLD) failed
> MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less than 
> 1

You sent dest=1, but the apparently the communicator must be of size 1, meaning 
that the only possible destination is 0 (i.e., yourself).




On Feb 25, 2011, at 9:23 AM, Ashwinkumar Dobariya wrote:

> Hello everyone,
> 
> I am newbie here. I am running the code for Large eddy simulation of 
> turbulent flow. I am compiling the code using wrapper command and running the 
> code on Hydra cluster. when I am submitting the script file it is showing the 
> following error.
>  
> running mpdallexit on hydra127
> LAUNCHED mpd on hydra127  via
> RUNNING: mpd on hydra127
> LAUNCHED mpd on hydra118  via  hydra127
> RUNNING: mpd on hydra118
> Fatal error in MPI_Send: Invalid rank, error stack:
> MPI_Send(176): MPI_Send(buf=0x7fffa7a1e4a8, count=1, MPI_DOUBLE_PRECISION, 
> dest=1, tag=1, MPI_COMM_WORLD) failed
> MPI_Send(98).: Invalid rank has value 1 but must be nonnegative and less than 
> 1
>  Total Nb of PE:1
> 
>  PE#   0 /   1  OK
> PE# 00   0   0
> PE# 00  33   0 165   0  33
> PE# 0  -1  1 -1 -1 -1  8
>  PE_Table, PE#   0  complete
> PE# 0   -0.03   0.98  -1.00   1.00  -0.03   0.98
>  PE#   0  doesn t intersect any bloc
>  PE#   0  will communicate with0
>  single value
>  PE#   0  has   2  com. boundaries
>  Data_Read, PE#   0  complete
> 
>  PE#   0  checking boundary type for
>  0  1   1   1   0 165   0  33  nor sur sur sur gra  1  0  0
>  0  2  33  33   0 165   0  33EXC ->  1
>  0  3   0  33   1   1   0  33  sur nor sur sur gra  0  1  0
>  0  4   0  33 164 164   0  33  sur nor sur sur gra  0 -1  0
>  0  5   0  33   0 165   1   1  cyc cyc cyc sur cyc  0  0  1
>  0  6   0  33   0 165  33  33EXC ->  8
>  PE#   0  Set new
>  PE#   0  FFT Table
>  PE#   0  Coeff
> rank 0 in job 1  hydra127_34565   caused collective abort of all ranks
>   exit status of rank 0: return code 1
> 
> I am struggling to find the error in my code. can anybody suggest me where I 
> messed up.
> 
> Thanks and Regards,
> Ash
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Problems with mpicc-wrapper-data.txt

2011-02-25 Thread Jeff Squyres
Can you send the entire contents of 
/site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt?


On Feb 25, 2011, at 9:21 AM, Ole Widar Saastad wrote:

> I get the follwing error (it is more like a waring, the mpicc produce
> output):
> [olews@login-0-1 $ /site/VERSIONS/openmpi-1.4.3.intel.test/bin/mpicc
> [login-0-1.local:14689] keyval parser: error 1 reading file 
> /site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt 
> at line 1:
>  # There can be multiple blocks of configuration data, chosen by
> gcc: no input files
> 
> 
> The 
> /site/VERSIONS/openmpi-1.4.3.intel.test/share/openmpi/mpicc-wrapper-data.txt 
> is read, verified by chaning it and noticing it's effect. It works fint, but 
> many users are quite unhappy wit this error. I have used strace to see that 
> all the characters get read (322 from strace and 322 from wc).
> It looks like there is something internal in the executable
> 
> Is there a fix for apparently bug ? I searched the mailing list, but
> most information I got was of the type configure/make clean/make/make
> install and this is something I have tried before.
> 
> 
> 
> Background :
> 
> We have several installations of OpenMPI installed.
> 
> They reside at (showing mpicc location) :
> 
> /site/VERSIONS/openmpi-1.2.8.gnu/bin/mpicc
> /site/VERSIONS/openmpi-1.2.8.intel/bin/mpicc
> /site/VERSIONS/openmpi-1.3.3.gnu/bin/mpicc
> /site/VERSIONS/openmpi-1.3.3.intel/bin/mpicc
> /site/VERSIONS/openmpi-1.3.3.intel.ipath/bin/mpicc
> /site/VERSIONS/openmpi-1.3.3.pgi/bin/mpicc
> /site/VERSIONS/openmpi-1.4.1.gnu/bin/mpicc
> /site/VERSIONS/openmpi-1.4.1.intel/bin/mpicc
> /site/VERSIONS/openmpi-1.4.2.intel/bin/mpicc
> /site/VERSIONS/openmpi-1.4.3.gnu/bin/mpicc
> /site/VERSIONS/openmpi-1.4.3.gnu32/bin/mpicc
> /site/VERSIONS/openmpi-1.4.3.intel/bin/mpicc
> /site/VERSIONS/openmpi-1.4.3.intel.test/bin/mpicc
> /site/VERSIONS/openmpi-1.4.3.open64/bin/mpicc
> /site/VERSIONS/openmpi-1.4.3.pgi/bin/mpicc
> /site/VERSIONS/openmpi-1.4.intel/bin/mpicc
> /site/VERSIONS/openmpi-1.4.intel.icc/bin/mpicc
> 
> With corresponding modules to set up the correct path and library path.
> set modulefile [lrange [split [module-info name] {/}] 0 0]
> set apphome/site/VERSIONS/openmpi-1.4.3.intel.test
> set appnameOpenMPI
> set appurl http://www.open-mpi.org
> 
> module-whatis   "A High Performance Message Passing Library"
> 
> setenv  MPI_TYPE  openmpi
> 
> prepend-pathPATH$apphome/bin
> prepend-pathLD_LIBRARY_PATH $apphome/lib
> prepend-pathLD_LIBRARY_PATH $apphome/lib/openmpi
> prepend-pathMANPATH $apphome/share/man
> 
> 
> 
> -- 
> Ole W. Saastad, dr. scient.
> Scientific Computing Group, USIT, University of Oslo
> http://hpc.uio.no
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?

2011-02-25 Thread Xianglong Kong
.bashrc is not executed when I ssh the node. How can I let it be executed?

Kong

On Fri, Feb 25, 2011 at 10:04 AM, Jeff Squyres  wrote:
> Ensure to check that a) your .bashrc is actually executing when you "ssh 
> othernode env", and b) if .bashrc is executing, make sure that it isn't 
> prematurely exiting for non-interactive jobs.
>
>
> On Feb 25, 2011, at 9:58 AM, Xianglong Kong wrote:
>
>> I'm using openmpi 1.4.3. The cluster consist of two desktop with Intel
>> core 2 duo running on Ubuntu 10.04.
>>
>> A weird thing that i found is that when I issued the command "env |
>> grep LD_LIBRARY_PATH" on the slave node, it showed the mpi lib path.
>> But when
>> I issude the command "ssh slave-node env | grep LD_LIBRARY_PATH" on
>> the master side to check the LD_LIBRARY_PATH of the slave node, it
>> showed nothing. Also, issuing the command "ssh master-node env | grep
>> LD_LIBRARY_PATH" on the slave side would return the correct mpi lib
>> path.
>>
>> I tried to modify the .bashrc and the /etc/ld.so.conf.d/*.conf file to
>> configure the LD_LIBRARY_PATH on the slave node, but it seems to work
>> only locally. How can I set the LD_LIBRARY_PATH on the slave node
>> side, so that I can get the correct path when I use "ssh slave-node
>> env | grep LD_LIBRARY_PATH" on the master side?
>>
>> Kong
>>
>> On Wed, Feb 23, 2011 at 5:22 PM, Bill Rankin  wrote:
>>> Jeff:
>>>
 FWIW: I have rarely seen this to be the issue.
>>>
>>> Been bitten by similar situations before.  But it may not have been 
>>> OpenMPI.  In any case it was a while back.
>>>
 In short, programs are erroneous that do not guarantee that all their
 outstanding requests have completed before calling finalize.
>>>
>>> Agreed 100%.  The barrier won't prevent the case of unmatched 
>>> sends/receives or outstanding request handles, but if the logic is correct 
>>> it does make sure that everyone completes before anyone leaves.
>>>
>>> In any case, I also tried code #2 and it completed w/o issue on our 
>>> cluster.  I guess the next thing to ask Kong is regarding what version he 
>>> is running and what is the platform.
>>>
>>> -b
>>>
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Xianglong Kong
>> Department of Mechanical Engineering
>> University of Rochester
>> Phone: (585)520-4412
>> MSN: dinosaur8...@hotmail.com
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Xianglong Kong
Department of Mechanical Engineering
University of Rochester
Phone: (585)520-4412
MSN: dinosaur8...@hotmail.com



Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?

2011-02-25 Thread Jeff Squyres
Have a look at the bash man page, and these two FAQ items:

http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
http://www.open-mpi.org/faq/?category=running#mpirun-prefix


On Feb 25, 2011, at 10:31 AM, Xianglong Kong wrote:

> .bashrc is not executed when I ssh the node. How can I let it be executed?
> 
> Kong
> 
> On Fri, Feb 25, 2011 at 10:04 AM, Jeff Squyres  wrote:
>> Ensure to check that a) your .bashrc is actually executing when you "ssh 
>> othernode env", and b) if .bashrc is executing, make sure that it isn't 
>> prematurely exiting for non-interactive jobs.
>> 
>> 
>> On Feb 25, 2011, at 9:58 AM, Xianglong Kong wrote:
>> 
>>> I'm using openmpi 1.4.3. The cluster consist of two desktop with Intel
>>> core 2 duo running on Ubuntu 10.04.
>>> 
>>> A weird thing that i found is that when I issued the command "env |
>>> grep LD_LIBRARY_PATH" on the slave node, it showed the mpi lib path.
>>> But when
>>> I issude the command "ssh slave-node env | grep LD_LIBRARY_PATH" on
>>> the master side to check the LD_LIBRARY_PATH of the slave node, it
>>> showed nothing. Also, issuing the command "ssh master-node env | grep
>>> LD_LIBRARY_PATH" on the slave side would return the correct mpi lib
>>> path.
>>> 
>>> I tried to modify the .bashrc and the /etc/ld.so.conf.d/*.conf file to
>>> configure the LD_LIBRARY_PATH on the slave node, but it seems to work
>>> only locally. How can I set the LD_LIBRARY_PATH on the slave node
>>> side, so that I can get the correct path when I use "ssh slave-node
>>> env | grep LD_LIBRARY_PATH" on the master side?
>>> 
>>> Kong
>>> 
>>> On Wed, Feb 23, 2011 at 5:22 PM, Bill Rankin  wrote:
 Jeff:
 
> FWIW: I have rarely seen this to be the issue.
 
 Been bitten by similar situations before.  But it may not have been 
 OpenMPI.  In any case it was a while back.
 
> In short, programs are erroneous that do not guarantee that all their
> outstanding requests have completed before calling finalize.
 
 Agreed 100%.  The barrier won't prevent the case of unmatched 
 sends/receives or outstanding request handles, but if the logic is correct 
 it does make sure that everyone completes before anyone leaves.
 
 In any case, I also tried code #2 and it completed w/o issue on our 
 cluster.  I guess the next thing to ask Kong is regarding what version he 
 is running and what is the platform.
 
 -b
 
 
 
 
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
 
>>> 
>>> 
>>> 
>>> --
>>> Xianglong Kong
>>> Department of Mechanical Engineering
>>> University of Rochester
>>> Phone: (585)520-4412
>>> MSN: dinosaur8...@hotmail.com
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> 
> -- 
> Xianglong Kong
> Department of Mechanical Engineering
> University of Rochester
> Phone: (585)520-4412
> MSN: dinosaur8...@hotmail.com
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-02-25 Thread Nguyen Toan
Dear Josh,

Did you find out the problem? I still cannot progress anything.
Hope to hear some good news from you.

Regards,
Nguyen Toan

On Sun, Feb 13, 2011 at 3:04 PM, Nguyen Toan wrote:

> Hi Josh,
>
> I tried the MCA parameter you mentioned but it did not help, the unknown
> overhead still exists.
> Here I attach the output of 'ompi_info', both version 1.5 and 1.5.1.
> Hope you can find out the problem.
> Thank you.
>
> Regards,
> Nguyen Toan
>
> On Wed, Feb 9, 2011 at 11:08 PM, Joshua Hursey wrote:
>
>> It looks like the logic in the configure script is turning on the FT
>> thread for you when you specify both '--with-ft=cr' and
>> '--enable-mpi-threads'.
>>
>> Can you send me the output of 'ompi_info'? Can you also try the MCA
>> parameter that I mentioned earlier to see if that changes the performance?
>>
>> I there are many non-blocking sends and receives, there might be
>> performance bug with the way the point-to-point wrapper is tracking request
>> objects. If the above MCA parameter does not help the situation, let me know
>> and I might be able to take a look at this next week.
>>
>> Thanks,
>> Josh
>>
>> On Feb 9, 2011, at 1:40 AM, Nguyen Toan wrote:
>>
>> > Hi Josh,
>> > Thanks for the reply. I did not use the '--enable-ft-thread' option.
>> Here is my build options:
>> >
>> > CFLAGS=-g \
>> > ./configure \
>> > --with-ft=cr \
>> > --enable-mpi-threads \
>> > --with-blcr=/home/nguyen/opt/blcr \
>> > --with-blcr-libdir=/home/nguyen/opt/blcr/lib \
>> > --prefix=/home/nguyen/opt/openmpi \
>> > --with-openib \
>> > --enable-mpirun-prefix-by-default
>> >
>> > My application requires lots of communication in every loop, focusing on
>> MPI_Isend, MPI_Irecv and MPI_Wait. Also I want to make only one checkpoint
>> per application execution for my purpose, but the unknown overhead exists
>> even when no checkpoint was taken.
>> >
>> > Do you have any other idea?
>> >
>> > Regards,
>> > Nguyen Toan
>> >
>> >
>> > On Wed, Feb 9, 2011 at 12:41 AM, Joshua Hursey 
>> wrote:
>> > There are a few reasons why this might be occurring. Did you build with
>> the '--enable-ft-thread' option?
>> >
>> > If so, it looks like I didn't move over the thread_sleep_wait adjustment
>> from the trunk - the thread was being a bit too aggressive. Try adding the
>> following to your command line options, and see if it changes the
>> performance.
>> >  "-mca opal_cr_thread_sleep_wait 1000"
>> >
>> > There are other places to look as well depending on how frequently your
>> application communicates, how often you checkpoint, process layout, ... But
>> usually the aggressive nature of the thread is the main problem.
>> >
>> > Let me know if that helps.
>> >
>> > -- Josh
>> >
>> > On Feb 8, 2011, at 2:50 AM, Nguyen Toan wrote:
>> >
>> > > Hi all,
>> > >
>> > > I am using the latest version of OpenMPI (1.5.1) and BLCR (0.8.2).
>> > > I found that when running an application,which uses MPI_Isend,
>> MPI_Irecv and MPI_Wait,
>> > > enabling C/R, i.e using "-am ft-enable-cr", the application runtime is
>> much longer than the normal execution with mpirun (no checkpoint was taken).
>> > > This overhead becomes larger when the normal execution runtime is
>> longer.
>> > > Does anybody have any idea about this overhead, and how to eliminate
>> it?
>> > > Thanks.
>> > >
>> > > Regards,
>> > > Nguyen
>> > > ___
>> > > users mailing list
>> > > us...@open-mpi.org
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > 
>> > Joshua Hursey
>> > Postdoctoral Research Associate
>> > Oak Ridge National Laboratory
>> > http://users.nccs.gov/~jjhursey
>> >
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> 
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>