Re: [OMPI users] sync problem
Hi Danesh Make sure you have 700GB of RAM on the sum of all nodes you are using. Otherwise context switching and memory swapping may be the problem. MPI doesn't perform well in this conditions (and may break, particularly on large problems, I suppose). A good way to go about it is to look at the physical "RAM per core" if those are multi-core machines, and compare to the actual memory per core your program requires. You need to give the system some RAM also, and use no more than 80% or so of the memory. If you or a system administrator has access to the nodes, you can monitor the memory use with "top". If you have Ganglia on this cluster, you can use the memory report metric also. Another possibility is a memory leak, which may be in your program, or (less likely) in MPI. Note, however, that OpenMPI 1.3.0 and 1.3.1 had this problem (with Infinband only), which was fixed in 1.3.2: http://www.open-mpi.org/community/lists/announce/2009/04/0030.php https://svn.open-mpi.org/trac/ompi/ticket/1853 If you are using 1.3.0 or 1.3.1, upgrade to 1.3.2. I hope this helps. Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Danesh Daroui wrote: Dear all, I am not sure if this the right forum to ask this question, so sorry if I am wrong. I am using ScaLAPACK in my code and MPI of course (OMPI) in a electromagnetic solver program, running on a cluster. I get very strange behavior when I use a large number of processors to run my code for very large problems. In these cases, however, the program finishes successfully, but it stays until the wall time exceeds the limit and the job is terminated by queue manager (I use qsub ti submit a job). This happens when, for example I use more than 80 processors for a problem which needs more than 700 GB memory. For smaller problem, everything is OK and all output files are generated correctly, while when this happens, the output files are empty. I am almost sure that there is a synchronization problem and some processes fail to reach the finalization point while others are done. My code is written in C++ and in "main" function I call a routine called "Solver". My Solver function looks like below: Solver() { for (std::vector::iterator ti=times.begin(); ti!=times.end(); ++ti) { Stopwatch iwatch, dwatch, twatch; // some ScaLAPACK operations if (iamroot()) { // some operation only for root process } } blacs::gridexit(ictxt); blacs::exit(1); } and my "main" function which calls "Solver" looks like below: int main() { // some preparing operations Solver(); if (rank==0) std::cout << "Total execution time: " << time.tick() << " s\n" << std::flush; err=MPI_Finalize(); if (MPI_SUCCESS!=err) { std::cerr << "MPI_Finalize failed: " << err << "\n"; return err; } return 0; } I did put a "blacs::barrier(ictxt, 'A')" at the and of "Solver" routine, before calling "blacs::exit(1)" to make sure that all processes arrive here before MPI_Finalize, but the problem didn't solve. Do you have any idea where the problem is? Thanks in advance,
[OMPI users] overlapping communicators?
Hi, I have a Multiple Program Multiple Data with three programs running in parallel, say A, B and C. C is much slower so in order to balance the load I want to parallelize C into C0 to Cn (SPMD). There are very frequent communications between Ci processes and not so frequent, but still multiple times per second, between A, B and C0. I have running versions of ABC MPMD and the C*N SMPD. I was thinking of creating two communicators with C0 being a member of both, but I am told this is bad practice although I don't really know what the pitfalls are. An alternative would be to create and close the ABC communicator every time it is used, but I am worried about the cost of this operations and about making the code look messy. I would appreciate any advice onn this issue. Thanks, Tiago
Re: [OMPI users] make vt_tracefilter.cc:133: internal compilererror: Segmentation fault - openmpi-1.3.2
This looks like your compiler seg faulted. I think you should contact your compiler vendor and find out why. Additionally, you can disable the optional/3rd-party-add-on VampirTrace package with --enable-contrib-no-build=vt. This is the part of the code where your compiler seg faulted, so perhaps if you skip that part, you'll get a successful OMPI installation. On May 31, 2009, at 12:21 PM, Ralph Castain wrote: I don't believe the 1.3.x series supports Bproc/Beowulf systems - I'm afraid that support ended with the 1.2.x series. There is a possibility that someone will restore support beginning with the 1.5 release, but that is only a possibility at this point (not a commitment). On Sun, May 31, 2009 at 10:13 AM, wruslan wyusoff > wrote: [root@bismillah-00 openmpi-1.3.2]# make all install vt_tracefilter.cc: In function ‘int main(int, char**)’: vt_tracefilter.cc:133: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://bugzilla.redhat.com/bugzilla> for instructions. Preprocessed source stored into /tmp/cc353yuL.out file, please attach this to your bugreport. make[6]: *** [vtfilter-vt_tracefilter.o] Error 1 make[6]: Leaving directory `/home/openmpi-1.3.2/ompi/contrib/vt/vt/tools/vtfilter' ... == Installation failed for openmpi-1.3.2 on this machine. This machine runs OSCAR 5.0 Beowulf Cluster as head node on Fedora Core 5 Currently: openmpi-1.1.1 runs OK on this cluster. Please find the bug report file as attached. [root@bismillah-00 openmpi-1.3.2]# uname -a Linux bismillah-00.mmu.edu.my 2.6.15-1.2054_FC5 #1 Tue Mar 14 15:48:33 EST 2006 i686 i686 i386 GNU/Linux [root@bismillah-00 openmpi-1.3.2]# gcc -v Using built-in specs. Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=i386-redhat-linux Thread model: posix gcc version 4.1.0 20060304 (Red Hat 4.1.0-3) [root@bismillah-00 openmpi-1.3.2]# Thank you. wruslan wyusoff ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
[OMPI users] Problem getting OpenMPI to run
Good morning, I think I sent this out last week but I did some "experimentation" and kind-of/sort-of got my OpenMPI application to run. But I do have a weird problem. I can get the application (build with OpenMPI-1.3.2 with gcc and the app is built with Intel 10.2) to run on the IB network (not sure of the version of OFED but it might be 1.3.x) with certain CPUs. For example I can run the application on AMD Shanghai processors just fine. But when I try some other processors (also AMD), I get the following error message: error: executing task of job 3084 failed: execution daemon on host "compute-2-2.local" didn't accept task -- A daemon (pid 27796) died unexpectedly with status 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- mpirun: clean termination accomplished I've been googling my fingers off without any luck. My next steps are to start putting printf's in OpenMPI to figure out where the problem is occurring :) Any ideas or things I can do to start? (I can provide all kinds of information including ompi_info if you anyone cares to look through it). TIA! Jeff
Re: [OMPI users] Problem getting OpenMPI to run
On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote: error: executing task of job 3084 failed: execution daemon on host "compute-2-2.local" didn't accept task This looks like an error message from the resource manager/scheduler -- not from OMPI (i.e., OMPI tried to launch a process on a node and the launch failed because something rejected it). Which one are you using? -- Jeff Squyres Cisco Systems
Re: [OMPI users] Problem getting OpenMPI to run
Jeff Squyres wrote: On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote: error: executing task of job 3084 failed: execution daemon on host "compute-2-2.local" didn't accept task This looks like an error message from the resource manager/scheduler -- not from OMPI (i.e., OMPI tried to launch a process on a node and the launch failed because something rejected it). Which one are you using? SGE
Re: [OMPI users] Problem getting OpenMPI to run
On 06/01/09 14:58, Jeff Layton wrote: Jeff Squyres wrote: On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote: error: executing task of job 3084 failed: execution daemon on host "compute-2-2.local" didn't accept task This looks like an error message from the resource manager/scheduler -- not from OMPI (i.e., OMPI tried to launch a process on a node and the launch failed because something rejected it). Which one are you using? SGE ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Take a look at the following link for some info on SGE. http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge I do not know exactly what your error message is telling us, but I would first double check to see that you have your parallel environment set up similarly to what is shown in the FAQ. Rolf -- = rolf.vandeva...@sun.com 781-442-3043 =
Re: [OMPI users] Problem getting OpenMPI to run
On Jun 1, 2009, at 2:58 PM, Jeff Layton wrote: >> error: executing task of job 3084 failed: execution daemon on host >> "compute-2-2.local" didn't accept task > > This looks like an error message from the resource manager/scheduler > -- not from OMPI (i.e., OMPI tried to launch a process on a node and > the launch failed because something rejected it). > > Which one are you using? SGE I'm afraid I don't know much about SGE. :-( Can you run non-OMPI jobs through SGE on the same node(s) that are failing with Open MPI? -- Jeff Squyres Cisco Systems
Re: [OMPI users] Performance testing software?
HPL can "stress test" the MPI, but it is typically relatively insensitive to MPI performance. The usual use produces a measure of the peak floating-point performance of the system. A broader set of system performance measurements are found in the HPCC (HPC Challenge) tests, which include HPL. Many of these tests, however, still don't really focus on MPI performance. Tests that focus on MPI performance include the OSU tests. http://mvapich.cse.ohio-state.edu/benchmarks/ There are also Intel MPI Benchmarks (formerly Pallas). The NAS Parallel Benchmarks offer more "application-level" tests. Gus Correa wrote: The famous one is HPL, the Top500 benchmark: http://www.netlib.org/benchmark/hpl/ It takes some effort to configure and run it. mtcreekm...@broncs.utpa.edu wrote: I am wondering if there is some stress testing software for OpenMPI I can use to run on a cluster to give me an idea of the performance level of the system?
Re: [OMPI users] Problem getting OpenMPI to run
Jeff Layton wrote: Jeff Squyres wrote: On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote: error: executing task of job 3084 failed: execution daemon on host "compute-2-2.local" didn't accept task This looks like an error message from the resource manager/scheduler -- not from OMPI (i.e., OMPI tried to launch a process on a node and the launch failed because something rejected it). Which one are you using? When you built Open-MPI, did you use the --with-sge switch? Or if this is an OFED release, is it possible that this wasn't specified? FWIW, this looks like a Rocks compute node ("compute-2-2.local" gives that away). The OFED Rolls in Rocks have had a few issues in the past with how they were built, so you may be running into that. If you didn't build it yourself, I'd suggest at least giving that a try. Alternatively, OFED-1.4 is pretty good. Has a later version of Open-MPI than 1.3.x Joe SGE ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
Re: [OMPI users] mpi trace visualization
Roman Martonak wrote: I would like to profile the MPI code using the vampir trace integrated in openmpi-1.3.2. In order to visualize the trace files, apart from commercial vampir, is there some free viewer for the OTF files ? I'm rusty on this stuff. If you go to http://www.paratools.com/otf.php there is an "OTF Tutorial". On slide 5, there is a diagram showing tools, formats, convertors, etc. The diagram is colorful, but it's a few years old and represents a particular community of tool developers/users. The implication seems to be that the answer to your question is "TAU". Best to check since I have never used TAU myself. That same URL has a link to TAU. Depending on what you want to do, otfdump could also help. At least it's free! One last option: Sun Studio tools are available for free on SPARC and x64 and on Solaris and Linux. You can use OMPI or Sun ClusterTools (Sun MPI, based on OMPI). You can collect MPI tracing data (which uses the VampirTrace instrumentation inside OMPI) and then view the data (MPI timelines and all sorts of statistical analyses of the data).
Re: [OMPI users] How to use Multiple links withOpenMPI??????????????????
Note that striping doesn't really help you much until data sizes get large. For example, networks tend to have an elbow in the graph where the size of the message starts to matter (clearly evident on your graphs). Additionally, you have your network marked as with "hubs" not "switches" -- if you really do have hubs and not switches, you may run into serious contention issues if you start loading up the network. With both of these factors, even though you have 4 links, you likely aren't going to see much of a performance benefit until you send large messages (which will be limited by your bus speeds -- can you feed all 4 of your links from a single machine at line rate, or will you be limited by PCI bus speeds and contention?), and you may run into secondary performance issues due to contention on your hubs. On May 28, 2009, at 11:06 PM, shan axida wrote: Thank you! Mr. Jeff Squyres, I have conducted a simple MPI_Bcast experiment in out cluster. The results are shown in the file attached on this e-mail. The hostfile is : - hostname1 slots=4 hostname2 slots=4 hostname3 slots=4 hostname16 slots=4 - As we can see in the figure, it is little faster than single link when we use 2,3,4 links between nodes. My question is what would be the reason to make almost the same performance when we use 2,3,4 links ? Thank you! Axida From: Jeff Squyres To: Open MPI Users Sent: Wednesday, May 27, 2009 11:28:42 PM Subject: Re: [OMPI users] How to use Multiple links with OpenMPI?? Open MPI considers hosts differently than network links. So you should only list the actual hostname in the hostfile, with slots equal to the number of processors (4 in your case, I think?). Once the MPI processes are launched, they each look around on the host that they're running and find network paths to each of their peers. If they are multiple paths between pairs of peers, Open MPI will round-robin stripe messages across each of the links. We don't really have an easy setting for each peer pair only using 1 link. Indeed, since connectivity is bidirectional, the traffic patterns become less obvious if you want MPI_COMM_WORLD rank X to only use link Y -- what does that mean to the other 4 MPI processes on the other host (with whom you have assumedly assigned their own individual links as well)? On May 26, 2009, at 12:24 AM, shan axida wrote: > Hi everyone, > I want to ask how to use multiple links (multiple NICs) with OpenMPI. > For example, how can I assign a link to each process, if there are 4 links > and 4 processors on each node in our cluster? > Is this a correct way? > hostfile: > -- > host1-eth0 slots=1 > host1-eth1 slots=1 > host1-eth2 slots=1 > host1-eth3 slots=1 > host2-eth0 slots=1 > host2-eth1 slots=1 > host2-eth2 slots=1 > host2-eth3 slots=1 > ...... > ... ... > host16-eth0 slots=1 > host16-eth1 slots=1 > host16-eth2 slots=1 > host16-eth3 slots=1 > > > > > > > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users --Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] How to use Multiple links withOpenMPI??????????????????
On May 29, 2009, at 12:31 AM, shan axida wrote: Is it true to use bidirectianal communication with MPI in ethernet Cluster? Are you asking if Open MPI uses bi-direction TCP sockets? Yes, it does: we open one TCP socket between the MPI sender and receiver, and if the order is reversed (receiver becomes sender), we'll use the same socket. I have tried once (I thought, it is possible because of fully duplex swithes). However, I could not get bandwidth improvement as I was expecting. If you really are using hubs, then if you have processes A and B both sending to each other simultaneously across the same link, you're going to have contention and one of them will have to wait. Even if you do have switches, there is a *wide* performance variation of low-quality switches. Most low-cost ethernet 1GB switches perform correctly, but do not necessarily provide the same high performance that you can get with higher-cost switches (i.e., you get what you pay for). If you answer is YES, would you please tell me about pseudocode for bidirectional communication ? Thank you. Axida From: Jeff Squyres To: Open MPI Users Sent: Wednesday, May 27, 2009 11:28:42 PM Subject: Re: [OMPI users] How to use Multiple links with OpenMPI?? Open MPI considers hosts differently than network links. So you should only list the actual hostname in the hostfile, with slots equal to the number of processors (4 in your case, I think?). Once the MPI processes are launched, they each look around on the host that they're running and find network paths to each of their peers. If they are multiple paths between pairs of peers, Open MPI will round-robin stripe messages across each of the links. We don't really have an easy setting for each peer pair only using 1 link. Indeed, since connectivity is bidirectional, the traffic patterns become less obvious if you want MPI_COMM_WORLD rank X to only use link Y -- what does that mean to the other 4 MPI processes on the other host (with whom you have assumedly assigned their own individual links as well)? On May 26, 2009, at 12:24 AM, shan axida wrote: > Hi everyone, > I want to ask how to use multiple links (multiple NICs) with OpenMPI. > For example, how can I assign a link to each process, if there are 4 links > and 4 processors on each node in our cluster? > Is this a correct way? > hostfile: > -- > host1-eth0 slots=1 > host1-eth1 slots=1 > host1-eth2 slots=1 > host1-eth3 slots=1 > host2-eth0 slots=1 > host2-eth1 slots=1 > host2-eth2 slots=1 > host2-eth3 slots=1 > ...... > ... ... > host16-eth0 slots=1 > host16-eth1 slots=1 > host16-eth2 slots=1 > host16-eth3 slots=1 > > > > > > > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users --Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems