On 11/06/07, Adrian Knoth wrote:
What's the exact problem? compute-node -> frontend? I don't think you
have two processes on the frontend node, and even if you do, they should
use shared memory.
I stopped there being more than a single process on the frontend node
- this had no effect on the
Hi Adrian,
On 11/06/07, Adrian Knoth wrote:
Which OMPI version?
1.2.2
> $ perl -e 'die$!=110'
> Connection timed out at -e line 1.
Looks pretty much like a routing issue. Can you sniff on eth1 on the
frontend node?
I don't have root access, so am afraid not.
> This error message occu
On Mon, Jun 11, 2007 at 10:55:17PM +0100, Jonathan Underwood wrote:
> Hi,
Hi!
> I am seeing problems with a small linux cluster when running OpenMPI
> jobs. The error message I get is:
Which OMPI version?
> $ perl -e 'die$!=110'
> Connection timed out at -e line 1.
Looks pretty much like a ro
Hi,
I am seeing problems with a small linux cluster when running OpenMPI
jobs. The error message I get is:
[frontend][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=110
Following the FAQ, I looked to see what this error code corresponds to:
$ p
Ralph,
Thanks for the quick response, clarifications below.
Sean
From: users-boun...@open-mpi.org on behalf of Ralph H Castain
Sent: Mon 6/11/2007 3:49 PM
To: Open MPI Users
Subject: Re: [OMPI users] mpirun hanging when processes started on head node
I think the problem is that we use MPI_STATUS_IGNORE in the C++
bindings but don't check for it properly in mtl_mx_iprobe,
can you try applying this diff to ompi and having the user try again,
we will also push this into the 1.2 branch.
- Galen
Index: ompi/mca/mtl/mx/mtl_mx_probe.c
==
! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED,
!&nic_id, sizeof(nic_id),
&value, sizeof(int))) != MX_SUCCESS )
{
yes, a NIC ID is required for this call because a host may have multiple
NICs with
Hi Sean
Could you please clarify something? I¹m a little confused by your comments
about where things are running. I¹m assuming that you mean everything works
fine if you type the mpirun command on the head node and just let it launch
on your compute nodes that the problems only occur when you s
I forgot to add that we are using 'bproc'. Launching processes on the compute
nodes using bproc works well, I'm not sure if bproc is involved when processes
are launched on the local node.
Sean
From: users-boun...@open-mpi.org on behalf of Kelley, Sean
Sent: M
Hi,
We are seeing the following issue with Iprobe on our clusters running
openmpi-1.2.2. Here is the code and related information:
===
Modules currently loaded:
(sn31)/projects>module list
> > Currently Loaded Modulefiles:
> > 1) /opt/modules/oscar-modulefiles/default-manpath/1.0.1
> >
Hi,
We are running the OFED 1.2rc4 distribution containing openmpi-1.2.2 on a
RedhatEL4U4 system with Scyld Clusterware 4.1. The hardware configuration
consists of a DELL 2950 as the headnode and 3 DELL 1950 blades as compute nodes
using Cisco TopSpin Infiband HCAs and switches for the int
Greetings all,
I downloaded and configured v1.2.2 this morning on an Opteron cluster
using the following configure directives...
./configure --prefix=/share/apps CC=gcc CXX=g++ F77=g77 FC=gfortran
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64
Compilation seemed to go OK and there IS an
It's about using multiple network interfaces to exchange messages
between a pair of hosts. The networks can be identical or not.
george.
On Jun 9, 2007, at 8:19 PM, Alex Tumanov wrote:
forgive a trivial question, but what's a multi-rail?
On 6/8/07, George Bosilca wrote:
A fix for this p
With openmpi-1.2.0
i ran a: ompi_info --param btl tcp
and i see reference to:
MCA btl: parameter "btl_tcp_min_rdma_size" (current value: "131072")
MCA btl: parameter "btl_tcp_max_rdma_size" (current value: "2147483647")
Can TCP support RDMA? I thought you needed fancy hardware to get
such
George Bosilca wrote:
A fix for this problem is now available on the trunk. Please use any
revision after 14963 and your problem will vanish [I hope!]. There are
now some additional parameters which allow you to select which Myrinet
network you want to use in the case there are several availab
Hi Don,
The first time I ran the program I am working on. It
is perfectly scallable and on 20 processors it ran on
27 seconds (on two processors on 300 seconds).
The I had the curiosity to run it on a pentium D. It
ran in 30 senconds on a single core. On two cores it
ran on 37 seconds (I think som
Yes, we find its best to let users benchmark their code (if they
have it already) Or a code that uses similar algorithms. And then
have the user run on some machines we set aside.
While we are on the benchmark topic, Users might be interested, we
just installed a new set of Opteron 2220
Victor,
Obviously there are many variables involved with getting the best
performance out of a machine and understanding the 2 environments you
are comparing would be necessary as well as the job. I would not be able
to get my hands on another E10K for validation or projecting possible
gains
Hi Don,
But as I see you must pay for these debuggers.
Victor
--- Don Kerr wrote:
> Victor,
>
> You are right Prism will not work with Open MPI
> which Sun's ClusterTools
> 7 is based on. But Prism was not available for CT 6
> either. Totalview
> and Allinea's dd
Victor,
You are right Prism will not work with Open MPI which Sun's ClusterTools
7 is based on. But Prism was not available for CT 6 either. Totalview
and Allinea's ddt I believe have both been tested to work with Open MPI.
-DON
victor marian wrote:
I can't turn it off right now to look
Hi Don,
Seeing your mail, I suppose you are working at Sun. We
have a Sun 1 Server at our university, and my
program runs almost as fast on 16 UltraSparc2
processors as on a pentium D.The program is perfectly
scallable. I am a little bit dissapointed. Our Sparc
II are at 400MHz, and the Pent
On Jun 11, 2007, at 8:55 AM, Cupp, Matthew R wrote:
Ah ha! I didn't know that option was available as I didn't see it in
the documentation or in ./configure --help.
FWIW, the GNU Autoconf application creates configure scripts that
automatically accept "without" and "disable" versions of all
Additionally, Solaris comes with the IB drivers and since the libs are
there OMPI thinks that it is available. You can suppress this message with
--mca btl_base_warn_component_unused 0
or specifically call out the btls you wish to use, example
--mca btl self,sm,tcp
Brock Palen wrote:
It
Glad to contribute Victor!
I am running on a home workstation that uses an AMD 3800 cpu attached to
2 gigs of ram.
My timings for FT were 175 secs with one core and 110 on two cores with
-O3 and -mtune=amd64 as tuning options.
Brock, Terry and Jeff are all exactly correct in their comments
r
Thank you everybody for the advices.
I ran the NAS benchmark class B and it runs in 181
seconds on one core and in 90 seconds on two cores, so
it scales almost perfectly.
What were your timings, Jeff, and what processor do
you exactly have?
Mine is a Pentium D at 2.8GHz.
Ah ha! I didn't know that option was available as I didn't see it in
the documentation or in ./configure --help.
I just ended up rebuilding and installing torque to my /opt/torque
share. Thank you for your help with this.
Matt
__
Matt Cupp
Battelle Memorial Institut
Victor,
Build the FT benchmark and build it as a class B problem. This will run
in the 1-2 minute range instead of 2-4 seconds the CG class A benchmark
does.
Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Terry Frankcombe wrote:
Hi Victor
I'd suggest 3 seconds
I agree. I like benchmarks to run 15 minutes to 24 hours.
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Jun 11, 2007, at 4:17 AM, Terry Frankcombe wrote:
Hi Victor
I'd suggest 3 seconds of CPU time is far, far to small a problem to do
scaling tests with. Even
Measuring communications is a very tricky process; there's a lot of
factors involved. Check out this FAQ item:
http://www.open-mpi.org/faq/?category=tuning#running-perf-numbers
You might want to use a well-known benchmark program (e.g., NetPIPE,
link checker, etc.) to run pair-wise comm
Hi Victor
I'd suggest 3 seconds of CPU time is far, far to small a problem to do
scaling tests with. Even with only 2 CPUs, I wouldn't go below 100
times that.
On Mon, 2007-06-11 at 01:10 -0700, victor marian wrote:
> Hi Jeff
>
> I ran the NAS Parallel Bechmark and it gives for me
> -bash%/ex
Hi Jeff
I ran the NAS Parallel Bechmark and it gives for me
-bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
mpirun -np 1 cg.A.1
--
[0,1,0]: uDAPL on host SERVSOLARIS was unable to find
any NICs.
Another tr
31 matches
Mail list logo