If you are using the native Torque capabilities to launch Open MPI
jobs, note that limits.conf is not necessarily obeyed. I'm not a
Torque expert, but you should probably check out:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more
And check the Torque docs about how it propagates and enforces such
limits.
On May 17, 2008, at 10:58 AM, Javier Lazaro wrote:
I have install torque-2.3.0 and openmpi-1.2.3.
I make tests and I have discovered that the jobs launched with the
parameter '-hostfile' or '-machinefile' stops are to exceed the
limits in the file /etc/security/limits.conf
More details:
file hola.c
#include <stdio.h>
#include <unistd.h>
#include "mpi.h"
int main(int argc, char *argv[]){
int rank;
int size;
int i;
int namelen;
char pn[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,
&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Get_processor_name(pn,&namelen);
sleep(rank);
system("bash -c 'ulimit -a'");
for (i=0;;i++) {
if (i%100000000==0) {
printf("--> %i --> Hola desde %d, de un
total de: %d. estoy en %s\n",i, rank, size,pn);
}
}
MPI_Finalize();
return 0;
}
##
> mpicc hola.c
file mpi3.sh
#!/bin/sh
#PBS -l nodes=3:ppn=1
#PBS -N pruebaMPI3
#PBS -o 3outpruebaMPIout3
#PBS -e 3errpruebaMPIerr3
cat ${PBS_NODEFILE}
mpirun -hostfile ${PBS_NODEFILE} /home/javier/mpi_hola/a.out
##
launch job with torque
> qsub mpi3.sh
##
termined
file 3outpruebaMPIout3
maquina3b
maquina2b
maquina1b
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 8185
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited #limit
maquina3b
max user processes (-u) 8185
virtual memory (kbytes, -v) 2511840
file locks (-x) unlimited
--> 0 --> Hola desde 0, de un total de: 3. estoy en maquina3b
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 8185
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) 880005
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 8192
cpu time (seconds, -t) 60 #limit maquina2b
max user processes (-u) 8185
virtual memory (kbytes, -v) 2511840
file locks (-x) unlimited
--> 0 --> Hola desde 1, de un total de: 3. estoy en maquina2b
--> 100000000 --> Hola desde 0, de un total de: 3. estoy en maquina3b
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 8185
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) 880005
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 8192
cpu time (seconds, -t) 60 #limit maquina1b
max user processes (-u) 8185
virtual memory (kbytes, -v) 2511840
file locks (-x) unlimited
--> 0 --> Hola desde 2, de un total de: 3. estoy en maquina1b
--> 100000000 --> Hola desde 1, de un total de: 3. estoy en maquina2b
--> 200000000 --> Hola desde 0, de un total de: 3. estoy en maquina3b
--> 100000000 --> Hola desde 2, de un total de: 3. estoy en maquina1b
--> 200000000 --> Hola desde 1, de un total de: 3. estoy en maquina2b
........
--> -500000000 --> Hola desde 1, de un total de: 3. estoy en maquina2b
1 additional process aborted (not shown)
1 process killed (possibly by Open MPI)
##
file 3errpruebaMPIerr3
mpirun noticed that job rank 0 with PID 10839 on node maquina3b
exited on signal 15 (Terminated).
---------------------------
I have limited time of cpu at 60 seconds in all nodes. Torque modify
this limit only for maquina3b.
I think that torque should modify cpu's limit in the resf of nodes.
where is the error?
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems