Re: [OMPI users] TCP connection errors

2007-06-11 Thread Jonathan Underwood
On 11/06/07, Adrian Knoth wrote: What's the exact problem? compute-node -> frontend? I don't think you have two processes on the frontend node, and even if you do, they should use shared memory. I stopped there being more than a single process on the frontend node - this had no effect on the

Re: [OMPI users] TCP connection errors

2007-06-11 Thread Jonathan Underwood
Hi Adrian, On 11/06/07, Adrian Knoth wrote: Which OMPI version? 1.2.2 > $ perl -e 'die$!=110' > Connection timed out at -e line 1. Looks pretty much like a routing issue. Can you sniff on eth1 on the frontend node? I don't have root access, so am afraid not. > This error message occu

Re: [OMPI users] TCP connection errors

2007-06-11 Thread Adrian Knoth
On Mon, Jun 11, 2007 at 10:55:17PM +0100, Jonathan Underwood wrote: > Hi, Hi! > I am seeing problems with a small linux cluster when running OpenMPI > jobs. The error message I get is: Which OMPI version? > $ perl -e 'die$!=110' > Connection timed out at -e line 1. Looks pretty much like a ro

[OMPI users] TCP connection errors

2007-06-11 Thread Jonathan Underwood
Hi, I am seeing problems with a small linux cluster when running OpenMPI jobs. The error message I get is: [frontend][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=110 Following the FAQ, I looked to see what this error code corresponds to: $ p

Re: [OMPI users] mpirun hanging when processes started on head node

2007-06-11 Thread Kelley, Sean
Ralph, Thanks for the quick response, clarifications below. Sean From: users-boun...@open-mpi.org on behalf of Ralph H Castain Sent: Mon 6/11/2007 3:49 PM To: Open MPI Users Subject: Re: [OMPI users] mpirun hanging when processes started on head node

Re: [OMPI users] Open MPI issue with Iprobe

2007-06-11 Thread Galen Shipman
I think the problem is that we use MPI_STATUS_IGNORE in the C++ bindings but don't check for it properly in mtl_mx_iprobe, can you try applying this diff to ompi and having the user try again, we will also push this into the 1.2 branch. - Galen Index: ompi/mca/mtl/mx/mtl_mx_probe.c ==

Re: [OMPI users] mixing MX and TCP

2007-06-11 Thread Reese Faucette
! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED, !&nic_id, sizeof(nic_id), &value, sizeof(int))) != MX_SUCCESS ) { yes, a NIC ID is required for this call because a host may have multiple NICs with

Re: [OMPI users] mpirun hanging when processes started on head node

2007-06-11 Thread Ralph H Castain
Hi Sean Could you please clarify something? I¹m a little confused by your comments about where things are running. I¹m assuming that you mean everything works fine if you type the mpirun command on the head node and just let it launch on your compute nodes ­ that the problems only occur when you s

Re: [OMPI users] mpirun hanging when processes started on head node

2007-06-11 Thread Kelley, Sean
I forgot to add that we are using 'bproc'. Launching processes on the compute nodes using bproc works well, I'm not sure if bproc is involved when processes are launched on the local node. Sean From: users-boun...@open-mpi.org on behalf of Kelley, Sean Sent: M

[OMPI users] Open MPI issue with Iprobe

2007-06-11 Thread Corwell, Sophia
Hi, We are seeing the following issue with Iprobe on our clusters running openmpi-1.2.2. Here is the code and related information: === Modules currently loaded: (sn31)/projects>module list > > Currently Loaded Modulefiles: > > 1) /opt/modules/oscar-modulefiles/default-manpath/1.0.1 > >

[OMPI users] mpirun hanging when processes started on head node

2007-06-11 Thread Kelley, Sean
Hi, We are running the OFED 1.2rc4 distribution containing openmpi-1.2.2 on a RedhatEL4U4 system with Scyld Clusterware 4.1. The hardware configuration consists of a DELL 2950 as the headnode and 3 DELL 1950 blades as compute nodes using Cisco TopSpin Infiband HCAs and switches for the int

[OMPI users] f90 support not built with gfortran?

2007-06-11 Thread Jeff Pummill
Greetings all, I downloaded and configured v1.2.2 this morning on an Opteron cluster using the following configure directives... ./configure --prefix=/share/apps CC=gcc CXX=g++ F77=g77 FC=gfortran CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64 Compilation seemed to go OK and there IS an

Re: [OMPI users] mixing MX and TCP

2007-06-11 Thread George Bosilca
It's about using multiple network interfaces to exchange messages between a pair of hosts. The networks can be identical or not. george. On Jun 9, 2007, at 8:19 PM, Alex Tumanov wrote: forgive a trivial question, but what's a multi-rail? On 6/8/07, George Bosilca wrote: A fix for this p

[OMPI users] rdma over tcp?

2007-06-11 Thread Brock Palen
With openmpi-1.2.0 i ran a: ompi_info --param btl tcp and i see reference to: MCA btl: parameter "btl_tcp_min_rdma_size" (current value: "131072") MCA btl: parameter "btl_tcp_max_rdma_size" (current value: "2147483647") Can TCP support RDMA? I thought you needed fancy hardware to get such

Re: [OMPI users] mixing MX and TCP

2007-06-11 Thread Kees Verstoep
George Bosilca wrote: A fix for this problem is now available on the trunk. Please use any revision after 14963 and your problem will vanish [I hope!]. There are now some additional parameters which allow you to select which Myrinet network you want to use in the case there are several availab

Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread victor marian
Hi Don, The first time I ran the program I am working on. It is perfectly scallable and on 20 processors it ran on 27 seconds (on two processors on 300 seconds). The I had the curiosity to run it on a pentium D. It ran in 30 senconds on a single core. On two cores it ran on 37 seconds (I think som

Re: [OMPI users] Library Definitions

2007-06-11 Thread Brock Palen
Yes, we find its best to let users benchmark their code (if they have it already) Or a code that uses similar algorithms. And then have the user run on some machines we set aside. While we are on the benchmark topic, Users might be interested, we just installed a new set of Opteron 2220

Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread Don Kerr
Victor, Obviously there are many variables involved with getting the best performance out of a machine and understanding the 2 environments you are comparing would be necessary as well as the job. I would not be able to get my hands on another E10K for validation or projecting possible gains

Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread victor marian
Hi Don, But as I see you must pay for these debuggers. Victor --- Don Kerr wrote: > Victor, > > You are right Prism will not work with Open MPI > which Sun's ClusterTools > 7 is based on. But Prism was not available for CT 6 > either. Totalview > and Allinea's dd

Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread Don Kerr
Victor, You are right Prism will not work with Open MPI which Sun's ClusterTools 7 is based on. But Prism was not available for CT 6 either. Totalview and Allinea's ddt I believe have both been tested to work with Open MPI. -DON victor marian wrote: I can't turn it off right now to look

Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread victor marian
Hi Don, Seeing your mail, I suppose you are working at Sun. We have a Sun 1 Server at our university, and my program runs almost as fast on 16 UltraSparc2 processors as on a pentium D.The program is perfectly scallable. I am a little bit dissapointed. Our Sparc II are at 400MHz, and the Pent

Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-11 Thread Jeff Squyres
On Jun 11, 2007, at 8:55 AM, Cupp, Matthew R wrote: Ah ha! I didn't know that option was available as I didn't see it in the documentation or in ./configure --help. FWIW, the GNU Autoconf application creates configure scripts that automatically accept "without" and "disable" versions of all

Re: [OMPI users] Problem running MPI on a dual-core pentium D

2007-06-11 Thread Don Kerr
Additionally, Solaris comes with the IB drivers and since the libs are there OMPI thinks that it is available. You can suppress this message with --mca btl_base_warn_component_unused 0 or specifically call out the btls you wish to use, example --mca btl self,sm,tcp Brock Palen wrote: It

Re: [OMPI users] Library Definitions

2007-06-11 Thread Jeff Pummill
Glad to contribute Victor! I am running on a home workstation that uses an AMD 3800 cpu attached to 2 gigs of ram. My timings for FT were 175 secs with one core and 110 on two cores with -O3 and -mtune=amd64 as tuning options. Brock, Terry and Jeff are all exactly correct in their comments r

Re: [OMPI users] Library Definitions

2007-06-11 Thread victor marian
Thank you everybody for the advices. I ran the NAS benchmark class B and it runs in 181 seconds on one core and in 90 seconds on two cores, so it scales almost perfectly. What were your timings, Jeff, and what processor do you exactly have? Mine is a Pentium D at 2.8GHz.

Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-11 Thread Cupp, Matthew R
Ah ha! I didn't know that option was available as I didn't see it in the documentation or in ./configure --help. I just ended up rebuilding and installing torque to my /opt/torque share. Thank you for your help with this. Matt __ Matt Cupp Battelle Memorial Institut

Re: [OMPI users] Library Definitions

2007-06-11 Thread Jeff Pummill
Victor, Build the FT benchmark and build it as a class B problem. This will run in the 1-2 minute range instead of 2-4 seconds the CG class A benchmark does. Jeff F. Pummill Senior Linux Cluster Administrator University of Arkansas Terry Frankcombe wrote: Hi Victor I'd suggest 3 seconds

Re: [OMPI users] Library Definitions

2007-06-11 Thread Brock Palen
I agree. I like benchmarks to run 15 minutes to 24 hours. Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jun 11, 2007, at 4:17 AM, Terry Frankcombe wrote: Hi Victor I'd suggest 3 seconds of CPU time is far, far to small a problem to do scaling tests with. Even

Re: [OMPI users] Timing communication

2007-06-11 Thread Jeff Squyres
Measuring communications is a very tricky process; there's a lot of factors involved. Check out this FAQ item: http://www.open-mpi.org/faq/?category=tuning#running-perf-numbers You might want to use a well-known benchmark program (e.g., NetPIPE, link checker, etc.) to run pair-wise comm

Re: [OMPI users] Library Definitions

2007-06-11 Thread Terry Frankcombe
Hi Victor I'd suggest 3 seconds of CPU time is far, far to small a problem to do scaling tests with. Even with only 2 CPUs, I wouldn't go below 100 times that. On Mon, 2007-06-11 at 01:10 -0700, victor marian wrote: > Hi Jeff > > I ran the NAS Parallel Bechmark and it gives for me > -bash%/ex

Re: [OMPI users] Library Definitions

2007-06-11 Thread victor marian
Hi Jeff I ran the NAS Parallel Bechmark and it gives for me -bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$ mpirun -np 1 cg.A.1 -- [0,1,0]: uDAPL on host SERVSOLARIS was unable to find any NICs. Another tr