Re: [OMPI users] sync problem

2009-06-01 Thread Gus Correa

Hi Danesh

Make sure you have 700GB of RAM on the sum of all nodes you are using.
Otherwise context switching and memory swapping may be the problem.
MPI doesn't perform well in this conditions (and may break, particularly
on large problems, I suppose).

A good way to go about it is to look at the physical
"RAM per core" if those are multi-core machines,
and compare to the actual memory per core your program requires.
You need to give the system some RAM also, and use no more than 80% or
so of the memory.

If you or a system administrator has access to the nodes,
you can monitor the memory use with "top".
If you have Ganglia on this cluster, you can use the memory report
metric also.

Another possibility is a memory leak, which may be in your program,
or (less likely) in MPI.
Note, however, that OpenMPI 1.3.0 and 1.3.1 had this problem (with 
Infinband only), which was fixed in  1.3.2:


http://www.open-mpi.org/community/lists/announce/2009/04/0030.php
https://svn.open-mpi.org/trac/ompi/ticket/1853

If you are using 1.3.0 or 1.3.1, upgrade to 1.3.2.

I hope this helps.

Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Danesh Daroui wrote:

Dear all,

I am not sure if this the right forum to ask this question, so sorry if
I am wrong. I am using ScaLAPACK in my code and MPI of course (OMPI) in
a electromagnetic solver program, running on a cluster. I get very
strange behavior when I use a large number of processors to run my code
for very large problems. In these cases, however, the program finishes
successfully, but it stays until the wall time exceeds the limit and the
job is terminated by queue manager (I use qsub ti submit a job). This
happens when, for example I use more than 80 processors for a problem
which needs more than 700 GB memory. For smaller problem, everything is
OK and all output files are generated correctly, while when this
happens, the output files are empty. I am almost sure that there is a
synchronization problem and some processes fail to reach the
finalization point while others are done.

My code is written in C++ and in "main" function I call a routine called
"Solver". My Solver function looks like below:

Solver()
{
for (std::vector::iterator ti=times.begin();
ti!=times.end(); ++ti)
{
Stopwatch iwatch, dwatch, twatch;

// some ScaLAPACK operations

if (iamroot())
{
 // some operation only for root process
}
  }

blacs::gridexit(ictxt);
blacs::exit(1);
}

and my "main" function which calls "Solver" looks like below:


int main()
{

   // some preparing operations

Solver();
if (rank==0)
std::cout << "Total execution time: " << time.tick() <<
" s\n" << std::flush;

  err=MPI_Finalize();

  if (MPI_SUCCESS!=err)
  {
  std::cerr << "MPI_Finalize failed: " << err << "\n";
  return err;
  }

return 0;
}

I did put a "blacs::barrier(ictxt, 'A')" at the and of "Solver" routine,
before calling "blacs::exit(1)" to make sure that all processes arrive
here before MPI_Finalize, but the problem didn't solve. Do you have any
idea where the problem is?

Thanks in advance,






[OMPI users] overlapping communicators?

2009-06-01 Thread tsilva

Hi,

I have a Multiple Program Multiple Data with three programs running in  
parallel, say A, B and C. C is much slower so in order to balance the  
load I want to parallelize C into C0 to Cn (SPMD). There are very  
frequent communications between Ci processes and not so frequent, but  
still multiple times per second, between A, B and C0. I have running  
versions of ABC MPMD and the C*N SMPD.


I was thinking of creating two communicators with C0 being a member of  
both, but I am told this is bad practice although I don't really know  
what the pitfalls are. An alternative would be to create and close the  
ABC communicator every time it is used, but I am worried about the  
cost of this operations and about making the code look messy. I would  
appreciate any advice onn this issue.


Thanks,
Tiago




Re: [OMPI users] make vt_tracefilter.cc:133: internal compilererror: Segmentation fault - openmpi-1.3.2

2009-06-01 Thread Jeff Squyres
This looks like your compiler seg faulted.  I think you should contact  
your compiler vendor and find out why.


Additionally, you can disable the optional/3rd-party-add-on  
VampirTrace package with --enable-contrib-no-build=vt.  This is the  
part of the code where your compiler seg faulted, so perhaps if you  
skip that part, you'll get a successful OMPI installation.



On May 31, 2009, at 12:21 PM, Ralph Castain wrote:

I don't believe the 1.3.x series supports Bproc/Beowulf systems -  
I'm afraid that support ended with the 1.2.x series. There is a  
possibility that someone will restore support beginning with the 1.5  
release, but that is only a possibility at this point (not a  
commitment).




On Sun, May 31, 2009 at 10:13 AM, wruslan wyusoff > wrote:

[root@bismillah-00 openmpi-1.3.2]# make all install


vt_tracefilter.cc: In function ‘int main(int, char**)’:
vt_tracefilter.cc:133: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/cc353yuL.out file, please attach
this to your bugreport.
make[6]: *** [vtfilter-vt_tracefilter.o] Error 1
make[6]: Leaving directory
`/home/openmpi-1.3.2/ompi/contrib/vt/vt/tools/vtfilter'

...
==
Installation failed for openmpi-1.3.2 on this machine.
This machine runs OSCAR 5.0 Beowulf Cluster as head node on Fedora  
Core 5

Currently: openmpi-1.1.1 runs OK on this cluster.
Please find the bug report file as attached.

[root@bismillah-00 openmpi-1.3.2]# uname -a
Linux bismillah-00.mmu.edu.my 2.6.15-1.2054_FC5 #1 Tue Mar 14 15:48:33
EST 2006 i686 i686 i386 GNU/Linux

[root@bismillah-00 openmpi-1.3.2]# gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-libgcj-multifile
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre
--with-cpu=generic --host=i386-redhat-linux
Thread model: posix
gcc version 4.1.0 20060304 (Red Hat 4.1.0-3)
[root@bismillah-00 openmpi-1.3.2]#

Thank you.
wruslan wyusoff

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems




[OMPI users] Problem getting OpenMPI to run

2009-06-01 Thread Jeff Layton

Good morning,

I think I sent this out last week but I did some "experimentation"
and kind-of/sort-of got my OpenMPI application to run. But I do
have a weird problem.

I can get the application (build with OpenMPI-1.3.2 with gcc and
the app is built with Intel 10.2) to run on the IB network (not sure
of the version of OFED but it might be 1.3.x) with certain CPUs.
For example I can run the application on AMD Shanghai processors
just fine. But when I try some other processors (also AMD), I
get the following error message:


error: executing task of job 3084 failed: execution daemon on host 
"compute-2-2.local" didn't accept task

--
A daemon (pid 27796) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
mpirun: clean termination accomplished



I've been googling my fingers off without any luck. My next steps are
to start putting printf's in OpenMPI to figure out where the problem
is occurring :)  Any ideas or things I can do to start? (I can provide all
kinds of information including ompi_info if you anyone cares to look
through it).

TIA!

Jeff



Re: [OMPI users] Problem getting OpenMPI to run

2009-06-01 Thread Jeff Squyres

On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote:


error: executing task of job 3084 failed: execution daemon on host
"compute-2-2.local" didn't accept task



This looks like an error message from the resource manager/scheduler  
-- not from OMPI (i.e., OMPI tried to launch a process on a node and  
the launch failed because something rejected it).


Which one are you using?

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Problem getting OpenMPI to run

2009-06-01 Thread Jeff Layton

Jeff Squyres wrote:

On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote:


error: executing task of job 3084 failed: execution daemon on host
"compute-2-2.local" didn't accept task



This looks like an error message from the resource manager/scheduler 
-- not from OMPI (i.e., OMPI tried to launch a process on a node and 
the launch failed because something rejected it).


Which one are you using?


SGE



Re: [OMPI users] Problem getting OpenMPI to run

2009-06-01 Thread Rolf Vandevaart

On 06/01/09 14:58, Jeff Layton wrote:

Jeff Squyres wrote:

On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote:


error: executing task of job 3084 failed: execution daemon on host
"compute-2-2.local" didn't accept task



This looks like an error message from the resource manager/scheduler 
-- not from OMPI (i.e., OMPI tried to launch a process on a node and 
the launch failed because something rejected it).


Which one are you using?


SGE

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Take a look at the following link for some info on SGE.

http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge

I do not know exactly what your error message is telling us, but I would 
first double check to see that you have your parallel environment set up 
similarly to what is shown in the FAQ.


Rolf



--

=
rolf.vandeva...@sun.com
781-442-3043
=


Re: [OMPI users] Problem getting OpenMPI to run

2009-06-01 Thread Jeff Squyres

On Jun 1, 2009, at 2:58 PM, Jeff Layton wrote:


>> error: executing task of job 3084 failed: execution daemon on host
>> "compute-2-2.local" didn't accept task
>
> This looks like an error message from the resource manager/scheduler
> -- not from OMPI (i.e., OMPI tried to launch a process on a node and
> the launch failed because something rejected it).
>
> Which one are you using?

SGE




I'm afraid I don't know much about SGE.  :-(  Can you run non-OMPI  
jobs through SGE on the same node(s) that are failing with Open MPI?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Performance testing software?

2009-06-01 Thread Eugene Loh
HPL can "stress test" the MPI, but it is typically relatively 
insensitive to MPI performance.  The usual use produces a measure of the 
peak floating-point performance of the system.


A broader set of system performance measurements are found in the HPCC 
(HPC Challenge) tests, which include HPL.  Many of these tests, however, 
still don't really focus on MPI performance.


Tests that focus on MPI performance include the OSU tests.  
http://mvapich.cse.ohio-state.edu/benchmarks/  There are also Intel MPI 
Benchmarks (formerly Pallas).


The NAS Parallel Benchmarks offer more "application-level" tests.

Gus Correa wrote:


The famous one is HPL, the Top500 benchmark:
http://www.netlib.org/benchmark/hpl/
It takes some effort to configure and run it. 


mtcreekm...@broncs.utpa.edu wrote:

I am wondering if there is some stress testing software for OpenMPI I 
can use to run on a cluster to give me an idea of the performance 
level of the system?




Re: [OMPI users] Problem getting OpenMPI to run

2009-06-01 Thread Joe Landman

Jeff Layton wrote:

Jeff Squyres wrote:

On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote:


error: executing task of job 3084 failed: execution daemon on host
"compute-2-2.local" didn't accept task



This looks like an error message from the resource manager/scheduler 
-- not from OMPI (i.e., OMPI tried to launch a process on a node and 
the launch failed because something rejected it).


Which one are you using?


When you built Open-MPI, did you use the

--with-sge

switch?  Or if this is an OFED release, is it possible that this wasn't 
specified?


FWIW, this looks like a Rocks compute node ("compute-2-2.local" gives 
that away).  The OFED Rolls in Rocks have had a few issues in the past 
with how they were built, so you may be running into that.  If you 
didn't build it yourself, I'd suggest at least giving that a try.


Alternatively, OFED-1.4 is pretty good.  Has a later version of Open-MPI 
than 1.3.x


Joe



SGE

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics,
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
   http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


Re: [OMPI users] mpi trace visualization

2009-06-01 Thread Eugene Loh

Roman Martonak wrote:


I would like to profile the MPI code using the vampir trace integrated
in openmpi-1.3.2. In order to visualize
the trace files, apart from commercial vampir, is there some free
viewer for the OTF files ?
 


I'm rusty on this stuff.

If you go to http://www.paratools.com/otf.php there is an "OTF 
Tutorial".  On slide 5, there is a diagram showing tools, formats, 
convertors, etc.  The diagram is colorful, but it's a few years old and 
represents a particular community of tool developers/users.  The 
implication seems to be that the answer to your question is "TAU".  Best 
to check since I have never used TAU myself.  That same URL has a link 
to TAU.


Depending on what you want to do, otfdump could also help.  At least 
it's free!


One last option:  Sun Studio tools are available for free on SPARC and 
x64 and on Solaris and Linux.  You can use OMPI or Sun ClusterTools (Sun 
MPI, based on OMPI).  You can collect MPI tracing data (which uses the 
VampirTrace instrumentation inside OMPI) and then view the data (MPI 
timelines and all sorts of statistical analyses of the data).


Re: [OMPI users] How to use Multiple links withOpenMPI??????????????????

2009-06-01 Thread Jeff Squyres
Note that striping doesn't really help you much until data sizes get  
large.  For example, networks tend to have an elbow in the graph where  
the size of the message starts to matter (clearly evident on your  
graphs).


Additionally, you have your network marked as with "hubs" not  
"switches" -- if you really do have hubs and not switches, you may run  
into serious contention issues if you start loading up the network.


With both of these factors, even though you have 4 links, you likely  
aren't going to see much of a performance benefit until you send large  
messages (which will be limited by your bus speeds -- can you feed all  
4 of your links from a single machine at line rate, or will you be  
limited by PCI bus speeds and contention?), and you may run into  
secondary performance issues due to contention on your hubs.



On May 28, 2009, at 11:06 PM, shan axida wrote:


Thank you! Mr. Jeff Squyres,
I have conducted a simple MPI_Bcast experiment in out cluster.
The results are shown in the file attached on this e-mail.
The hostfile is :
-
hostname1 slots=4
hostname2 slots=4
hostname3 slots=4


hostname16 slots=4
-
As we can see in the figure, it is little faster than single link
when we use 2,3,4 links between nodes.
My question is what would be the reason to make almost the same
performance when we use 2,3,4 links ?

Thank you!

Axida




From: Jeff Squyres 
To: Open MPI Users 
Sent: Wednesday, May 27, 2009 11:28:42 PM
Subject: Re: [OMPI users] How to use Multiple links with  
OpenMPI??


Open MPI considers hosts differently than network links.

So you should only list the actual hostname in the hostfile, with  
slots equal to the number of processors (4 in your case, I think?).


Once the MPI processes are launched, they each look around on the  
host that they're running and find network paths to each of their  
peers.  If they are multiple paths between pairs of peers, Open MPI  
will round-robin stripe messages across each of the links.  We don't  
really have an easy setting for each peer pair only using 1 link.   
Indeed, since connectivity is bidirectional, the traffic patterns  
become less obvious if you want MPI_COMM_WORLD rank X to only use  
link Y -- what does that mean to the other 4 MPI processes on the  
other host (with whom you have assumedly assigned their own  
individual links as well)?



On May 26, 2009, at 12:24 AM, shan axida wrote:

> Hi everyone,
> I want to ask how to use multiple links (multiple NICs) with  
OpenMPI.
> For example, how can I assign a link to each process, if there are  
4 links

> and 4 processors on each node in our cluster?
> Is this a correct way?
> hostfile:
> --
> host1-eth0 slots=1
> host1-eth1 slots=1
> host1-eth2 slots=1
> host1-eth3 slots=1
> host2-eth0 slots=1
> host2-eth1 slots=1
> host2-eth2 slots=1
> host2-eth3 slots=1
> ......
> ...  ...
> host16-eth0 slots=1
> host16-eth1 slots=1
> host16-eth2 slots=1
> host16-eth3 slots=1
> 
>
>
>
>
>
>
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] How to use Multiple links withOpenMPI??????????????????

2009-06-01 Thread Jeff Squyres

On May 29, 2009, at 12:31 AM, shan axida wrote:

Is it true to use bidirectianal communication with MPI in ethernet  
Cluster?


Are you asking if Open MPI uses bi-direction TCP sockets?  Yes, it  
does: we open one TCP socket between the MPI sender and receiver, and  
if the order is reversed (receiver becomes sender), we'll use the same  
socket.


I have tried once (I thought, it is possible because of fully duplex  
swithes).

 However, I could not get bandwidth improvement as I was expecting.


If you really are using hubs, then if you have processes A and B both  
sending to each other simultaneously across the same link, you're  
going to have contention and one of them will have to wait.


Even if you do have switches, there is a *wide* performance variation  
of low-quality switches.  Most low-cost ethernet 1GB switches perform  
correctly, but do not necessarily provide the same high performance  
that you can get with higher-cost switches (i.e., you get what you pay  
for).



If you answer is YES, would you please tell me about pseudocode for
bidirectional communication ?

Thank you.
Axida



From: Jeff Squyres 
To: Open MPI Users 
Sent: Wednesday, May 27, 2009 11:28:42 PM
Subject: Re: [OMPI users] How to use Multiple links with  
OpenMPI??


Open MPI considers hosts differently than network links.

So you should only list the actual hostname in the hostfile, with  
slots equal to the number of processors (4 in your case, I think?).


Once the MPI processes are launched, they each look around on the  
host that they're running and find network paths to each of their  
peers.  If they are multiple paths between pairs of peers, Open MPI  
will round-robin stripe messages across each of the links.  We don't  
really have an easy setting for each peer pair only using 1 link.   
Indeed, since connectivity is bidirectional, the traffic patterns  
become less obvious if you want MPI_COMM_WORLD rank X to only use  
link Y -- what does that mean to the other 4 MPI processes on the  
other host (with whom you have assumedly assigned their own  
individual links as well)?



On May 26, 2009, at 12:24 AM, shan axida wrote:

> Hi everyone,
> I want to ask how to use multiple links (multiple NICs) with  
OpenMPI.
> For example, how can I assign a link to each process, if there are  
4 links

> and 4 processors on each node in our cluster?
> Is this a correct way?
> hostfile:
> --
> host1-eth0 slots=1
> host1-eth1 slots=1
> host1-eth2 slots=1
> host1-eth3 slots=1
> host2-eth0 slots=1
> host2-eth1 slots=1
> host2-eth2 slots=1
> host2-eth3 slots=1
> ......
> ...  ...
> host16-eth0 slots=1
> host16-eth1 slots=1
> host16-eth2 slots=1
> host16-eth3 slots=1
> 
>
>
>
>
>
>
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems