[OMPI users] Slurm and Openmpi

2007-01-19 Thread Robert Bicknell
I'm trying to get slurm and openmpi to work together on a debian, two node cluster. Slurm and openmpi seem to work fine seperately, but when I try to run a mpi program in a slurm allocation, all the processes get run on the master node, and not distributed to the second node. What am I doing w

Re: [OMPI users] IB bandwidth vs. kernels

2007-01-19 Thread Robin Humble
On Thu, Jan 18, 2007 at 03:10:15PM +0200, Gleb Natapov wrote: >On Thu, Jan 18, 2007 at 07:17:13AM -0500, Robin Humble wrote: >> On Thu, Jan 18, 2007 at 11:08:04AM +0200, Gleb Natapov wrote: >> >On Thu, Jan 18, 2007 at 03:52:19AM -0500, Robin Humble wrote: >> >> On Wed, Jan 17, 2007 at 08:55:31AM -0

Re: [OMPI users] Problems with ompi1.2b2, SGE and DLPOLY[Scanned]

2007-01-19 Thread Barry Evans
It's gigabit attached, pathscale is there simply to indicate that ompi was compiled with ekopath - Barry -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Galen Shipman Sent: 19 January 2007 01:56 To: Open MPI Users Cc: pak@sun.com Su

Re: [OMPI users] Slurm and Openmpi

2007-01-19 Thread Ralph Castain
Open MPI and SLURM should work together just fine right out-of-the-box. The typical command progression is: srun -n x -A mpirun -n y . If you are doing those commands and still see everything running on the head node, then two things could be happening: (a) you really aren't getting an allo

Re: [OMPI users] Problems with ompi1.2b2, SGE and DLPOLY[Scanned]

2007-01-19 Thread Galen Shipman
ah, disregard.. On Jan 19, 2007, at 1:35 AM, Barry Evans wrote: It's gigabit attached, pathscale is there simply to indicate that ompi was compiled with ekopath - Barry -Original Message- From: users-boun...@open-mpi.org [mailto:users-bounces@open- mpi.org] On Behalf Of Galen Shipm

Re: [OMPI users] Problems with ompi1.2b2, SGE and DLPOLY[Scanned]

2007-01-19 Thread Pak Lui
It seems from what you said that the DLPOLY program would fail with or without SGE is being used. Since I am not familiar with DLPOLY, I am a little clueless as to what else you can try. Perhaps you can try looking deeper into DLPOLY by having a debuggable build and running a parallel debugger

Re: [OMPI users] OpenMPI/OpenIB/IMB hangs[Scanned]

2007-01-19 Thread Jeff Squyres
Beware: this is a lengthy, detailed message. On Jan 18, 2007, at 3:53 PM, Arif Ali wrote: 1. We have HW * 2xBladecenter H * 2xCisco Infiniband Switch Modules * 1xCisco Infiniband Switch * 16x PPC64 JS21 blades each are 4 cores, with Cisco HCA Can you provide the details of your Cisco HCA? S

Re: [OMPI users] OpenMPI/OpenIB/IMB hangs[Scanned]

2007-01-19 Thread Arif Ali
see below for answers, regards, Arif Ali Software Engineer OCF plc Mobile: +44 (0)7970 148 122 Office: +44 (0)114 257 2200 Fax:+44 (0)114 257 0022 Email: a...@ocf.co.uk Web:http://www.ocf.co.uk Skype: arif_ali80 MSN:a...@ocf.co.uk Jeff Squyres wrote: Beware: this is a lengthy,

Re: [OMPI users] OpenMPI/OpenIB/IMB hangs[Scanned]

2007-01-19 Thread Gleb Natapov
On Fri, Jan 19, 2007 at 05:51:49PM +, Arif Ali wrote: > >>I tried the nightly snapshot of OpenMPI-1.2b4r13137, which failed > >>miserably. > >> > > > >Can you describe what happened there? Is it failing in a different way? > > > Here's the output > > #-

Re: [OMPI users] Slurm and Openmpi

2007-01-19 Thread Robert Bicknell
Thanks for your response. The program that I have been using for testing purposes is a simple hello: #include #include #include #include #include #include main(int argc, char *argv) { char name[BUFSIZ]; int length; int rank; struct rlimit rlim; FILE *output; MPI_Init(&argc, &argv

Re: [OMPI users] Slurm and Openmpi

2007-01-19 Thread Jeff Squyres
I think the SLURM code in Open MPI is making an assumption that is failing in your case: we assume that your nodes will have a specific naming convention: mycluster.example.com --> head node mycluster01.example.com --> cluster node 1 mycluster02.example.com --> cluster node 2 ...etc. OMPI is

[OMPI users] openmpi equivalent to mpich serv_p4 daemon

2007-01-19 Thread Evan Smyth
I had been using MPICH and its serv_p4 daemon to speed startup times. I've decided to try OpenMPI (primarily for the fault-tolerance features) and would like to know what the equivalent of the serv_p4 daemon is. It appears as though the orted daemon may be what I am after but I don't quite und

Re: [OMPI users] OpenMPI/OpenIB/IMB hangs[Scanned]

2007-01-19 Thread Arif Ali
-Original Message- From: Gleb Natapov [mailto:gl...@voltaire.com] Sent: Fri 19/01/2007 18:33 To: Arif Ali Cc: Open MPI Users; Galen Shipman; Brad Benton; Pavel Shamis; Russell Slack; Barry Evans Subject: Re: [OMPI users] OpenMPI/OpenIB/IMB hangs[Scanned] On Fri, Jan 19, 2007 at 05:51:

Re: [OMPI users] Slurm and Openmpi

2007-01-19 Thread Robert Bicknell
Thanks for the help I renamed the nodes, and now slurm and openmpi seem to be playing nicely with each other. Bob On 1/19/07, Jeff Squyres wrote: I think the SLURM code in Open MPI is making an assumption that is failing in your case: we assume that your nodes will have a specific naming

[OMPI users] MPI_ERR_COMM: invalid communicator using POP 1.2

2007-01-19 Thread Axel Schweiger
I am having a problem running pop 1.2 (Parallel Ocean Model) with OpenMPI version 1.1.2 compiled with PGI 6.2-4 on RH EL-4 Update 4 (configure result attached) The error is as follows: mpirun -v -np 4 -machinefile node18.dat pop [node18:11220] *** An error occurred in MPI_Cart_shift [node18:11

Re: [OMPI users] OpenMPI/OpenIB/IMB hangs[Scanned]

2007-01-19 Thread Jeff Squyres
On Jan 19, 2007, at 6:19 PM, Arif Ali wrote: > [0,1,59][btl_openib_component.c: 1153:btl_openib_component_progress] from > node16 to: node02 error polling HP CQ with status REMOTE ACCESS ERROR > status number 10 for wr_id 268919352 opcode 256614836 > mpirun noticed that job rank 0 with PID 0