Thanks Josh. Actually I also tested with the Himeno benchmark<http://accc.riken.jp/assets/files/himenob_loadmodule/himenoBMT_c_mpi.lzh>and got the same problem, so I think this could be a bug. Hope this information also helps.
Regards, Nguyen Toan On Fri, Mar 4, 2011 at 12:04 AM, Joshua Hursey <jjhur...@open-mpi.org>wrote: > Thanks for the program. I created a ticket for this performance bug and > attached the tarball to the ticket: > https://svn.open-mpi.org/trac/ompi/ticket/2743 > > I do not know exactly when I will be able to get back to this, but > hopefully soon. I added you to the CC so you should receive any progress > updates regarding the ticket as we move forward. > > Thanks again, > Josh > > On Mar 3, 2011, at 2:12 AM, Nguyen Toan wrote: > > > Dear Josh, > > > > Attached with this email is a small program that illustrates the > performance problem. You can find simple instructions in the README file. > > There are also 2 sample result files (cpu.256^3.8N.*) which show the > execution time difference between 2 cases. > > Hope you can take some time to find the problem. > > Thanks for your kindness. > > > > Best Regards, > > Nguyen Toan > > > > On Wed, Mar 2, 2011 at 3:00 AM, Joshua Hursey <jjhur...@open-mpi.org> > wrote: > > I have not had the time to look into the performance problem yet, and > probably won't for a little while. Can you send me a small program that > illustrates the performance problem, and I'll file a bug so we don't lose > track of it. > > > > Thanks, > > Josh > > > > On Feb 25, 2011, at 1:31 PM, Nguyen Toan wrote: > > > > > Dear Josh, > > > > > > Did you find out the problem? I still cannot progress anything. > > > Hope to hear some good news from you. > > > > > > Regards, > > > Nguyen Toan > > > > > > On Sun, Feb 13, 2011 at 3:04 PM, Nguyen Toan <nguyentoan1...@gmail.com> > wrote: > > > Hi Josh, > > > > > > I tried the MCA parameter you mentioned but it did not help, the > unknown overhead still exists. > > > Here I attach the output of 'ompi_info', both version 1.5 and 1.5.1. > > > Hope you can find out the problem. > > > Thank you. > > > > > > Regards, > > > Nguyen Toan > > > > > > On Wed, Feb 9, 2011 at 11:08 PM, Joshua Hursey <jjhur...@open-mpi.org> > wrote: > > > It looks like the logic in the configure script is turning on the FT > thread for you when you specify both '--with-ft=cr' and > '--enable-mpi-threads'. > > > > > > Can you send me the output of 'ompi_info'? Can you also try the MCA > parameter that I mentioned earlier to see if that changes the performance? > > > > > > I there are many non-blocking sends and receives, there might be > performance bug with the way the point-to-point wrapper is tracking request > objects. If the above MCA parameter does not help the situation, let me know > and I might be able to take a look at this next week. > > > > > > Thanks, > > > Josh > > > > > > On Feb 9, 2011, at 1:40 AM, Nguyen Toan wrote: > > > > > > > Hi Josh, > > > > Thanks for the reply. I did not use the '--enable-ft-thread' option. > Here is my build options: > > > > > > > > CFLAGS=-g \ > > > > ./configure \ > > > > --with-ft=cr \ > > > > --enable-mpi-threads \ > > > > --with-blcr=/home/nguyen/opt/blcr \ > > > > --with-blcr-libdir=/home/nguyen/opt/blcr/lib \ > > > > --prefix=/home/nguyen/opt/openmpi \ > > > > --with-openib \ > > > > --enable-mpirun-prefix-by-default > > > > > > > > My application requires lots of communication in every loop, focusing > on MPI_Isend, MPI_Irecv and MPI_Wait. Also I want to make only one > checkpoint per application execution for my purpose, but the unknown > overhead exists even when no checkpoint was taken. > > > > > > > > Do you have any other idea? > > > > > > > > Regards, > > > > Nguyen Toan > > > > > > > > > > > > On Wed, Feb 9, 2011 at 12:41 AM, Joshua Hursey < > jjhur...@open-mpi.org> wrote: > > > > There are a few reasons why this might be occurring. Did you build > with the '--enable-ft-thread' option? > > > > > > > > If so, it looks like I didn't move over the thread_sleep_wait > adjustment from the trunk - the thread was being a bit too aggressive. Try > adding the following to your command line options, and see if it changes the > performance. > > > > "-mca opal_cr_thread_sleep_wait 1000" > > > > > > > > There are other places to look as well depending on how frequently > your application communicates, how often you checkpoint, process layout, ... > But usually the aggressive nature of the thread is the main problem. > > > > > > > > Let me know if that helps. > > > > > > > > -- Josh > > > > > > > > On Feb 8, 2011, at 2:50 AM, Nguyen Toan wrote: > > > > > > > > > Hi all, > > > > > > > > > > I am using the latest version of OpenMPI (1.5.1) and BLCR (0.8.2). > > > > > I found that when running an application,which uses MPI_Isend, > MPI_Irecv and MPI_Wait, > > > > > enabling C/R, i.e using "-am ft-enable-cr", the application runtime > is much longer than the normal execution with mpirun (no checkpoint was > taken). > > > > > This overhead becomes larger when the normal execution runtime is > longer. > > > > > Does anybody have any idea about this overhead, and how to > eliminate it? > > > > > Thanks. > > > > > > > > > > Regards, > > > > > Nguyen > > > > > _______________________________________________ > > > > > users mailing list > > > > > us...@open-mpi.org > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > ------------------------------------ > > > > Joshua Hursey > > > > Postdoctoral Research Associate > > > > Oak Ridge National Laboratory > > > > http://users.nccs.gov/~jjhursey > > > > > > > > > > > > _______________________________________________ > > > > users mailing list > > > > us...@open-mpi.org > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > _______________________________________________ > > > > users mailing list > > > > us...@open-mpi.org > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > ------------------------------------ > > > Joshua Hursey > > > Postdoctoral Research Associate > > > Oak Ridge National Laboratory > > > http://users.nccs.gov/~jjhursey > > > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ------------------------------------ > > Joshua Hursey > > Postdoctoral Research Associate > > Oak Ridge National Laboratory > > http://users.nccs.gov/~jjhursey > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > <test.tar> > > ------------------------------------ > Joshua Hursey > Postdoctoral Research Associate > Oak Ridge National Laboratory > http://users.nccs.gov/~jjhursey > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >