I'm trying to get slurm and openmpi to work together on a debian, two
node cluster. Slurm and openmpi seem to work fine seperately, but when
I try to run a mpi program in a slurm allocation, all the processes get
run on the master node, and not distributed to the second node. What am
I doing w
On Thu, Jan 18, 2007 at 03:10:15PM +0200, Gleb Natapov wrote:
>On Thu, Jan 18, 2007 at 07:17:13AM -0500, Robin Humble wrote:
>> On Thu, Jan 18, 2007 at 11:08:04AM +0200, Gleb Natapov wrote:
>> >On Thu, Jan 18, 2007 at 03:52:19AM -0500, Robin Humble wrote:
>> >> On Wed, Jan 17, 2007 at 08:55:31AM -0
It's gigabit attached, pathscale is there simply to indicate that ompi
was compiled with ekopath
- Barry
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Galen Shipman
Sent: 19 January 2007 01:56
To: Open MPI Users
Cc: pak@sun.com
Su
Open MPI and SLURM should work together just fine right out-of-the-box. The
typical command progression is:
srun -n x -A
mpirun -n y .
If you are doing those commands and still see everything running on the head
node, then two things could be happening:
(a) you really aren't getting an allo
ah, disregard..
On Jan 19, 2007, at 1:35 AM, Barry Evans wrote:
It's gigabit attached, pathscale is there simply to indicate that ompi
was compiled with ekopath
- Barry
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-bounces@open-
mpi.org] On
Behalf Of Galen Shipm
It seems from what you said that the DLPOLY program would fail with or
without SGE is being used. Since I am not familiar with DLPOLY, I am a
little clueless as to what else you can try. Perhaps you can try looking
deeper into DLPOLY by having a debuggable build and running a parallel
debugger
Beware: this is a lengthy, detailed message.
On Jan 18, 2007, at 3:53 PM, Arif Ali wrote:
1. We have
HW
* 2xBladecenter H
* 2xCisco Infiniband Switch Modules
* 1xCisco Infiniband Switch
* 16x PPC64 JS21 blades each are 4 cores, with Cisco HCA
Can you provide the details of your Cisco HCA?
S
see below for answers,
regards,
Arif Ali
Software Engineer
OCF plc
Mobile: +44 (0)7970 148 122
Office: +44 (0)114 257 2200
Fax:+44 (0)114 257 0022
Email: a...@ocf.co.uk
Web:http://www.ocf.co.uk
Skype: arif_ali80
MSN:a...@ocf.co.uk
Jeff Squyres wrote:
Beware: this is a lengthy,
On Fri, Jan 19, 2007 at 05:51:49PM +, Arif Ali wrote:
> >>I tried the nightly snapshot of OpenMPI-1.2b4r13137, which failed
> >>miserably.
> >>
> >
> >Can you describe what happened there? Is it failing in a different way?
> >
> Here's the output
>
> #-
Thanks for your response. The program that I have been using for testing
purposes is a simple hello:
#include
#include
#include
#include
#include
#include
main(int argc, char *argv)
{
char name[BUFSIZ];
int length;
int rank;
struct rlimit rlim;
FILE *output;
MPI_Init(&argc, &argv
I think the SLURM code in Open MPI is making an assumption that is
failing in your case: we assume that your nodes will have a specific
naming convention:
mycluster.example.com --> head node
mycluster01.example.com --> cluster node 1
mycluster02.example.com --> cluster node 2
...etc.
OMPI is
I had been using MPICH and its serv_p4 daemon to speed startup times.
I've decided to try OpenMPI (primarily for the fault-tolerance features)
and would like to know what the equivalent of the serv_p4 daemon is.
It appears as though the orted daemon may be what I am after but I don't
quite und
-Original Message-
From: Gleb Natapov [mailto:gl...@voltaire.com]
Sent: Fri 19/01/2007 18:33
To: Arif Ali
Cc: Open MPI Users; Galen Shipman; Brad Benton; Pavel Shamis; Russell Slack;
Barry Evans
Subject: Re: [OMPI users] OpenMPI/OpenIB/IMB hangs[Scanned]
On Fri, Jan 19, 2007 at 05:51:
Thanks for the help I renamed the nodes, and now slurm and openmpi seem
to be playing nicely with each other.
Bob
On 1/19/07, Jeff Squyres wrote:
I think the SLURM code in Open MPI is making an assumption that is
failing in your case: we assume that your nodes will have a specific
naming
I am having a problem running pop 1.2 (Parallel Ocean Model) with
OpenMPI version 1.1.2 compiled with PGI 6.2-4 on RH EL-4 Update 4
(configure result attached)
The error is as follows:
mpirun -v -np 4 -machinefile node18.dat pop
[node18:11220] *** An error occurred in MPI_Cart_shift
[node18:11
On Jan 19, 2007, at 6:19 PM, Arif Ali wrote:
> [0,1,59][btl_openib_component.c:
1153:btl_openib_component_progress] from
> node16 to: node02 error polling HP CQ with status REMOTE ACCESS
ERROR
> status number 10 for wr_id 268919352 opcode 256614836
> mpirun noticed that job rank 0 with PID 0
16 matches
Mail list logo