Hello,

Hopefully the below information will be helpful.

SLURM Version: 1.3.15

node64-test ~>salloc -n3
salloc: Granted job allocation 826

node64-test ~>srun hostname
node64-24.xxxx.xxxx.xxxx.xxxx
node64-25.xxxx.xxxx.xxxx.xxxx
node64-24.xxxx.xxxx.xxxx.xxxx

node64-test ~>printenv | grep SLURM
SLURM_NODELIST=node64-[24-25]
SLURM_NNODES=2
SLURM_JOBID=826
SLURM_TASKS_PER_NODE=2,1
SLURM_JOB_ID=826
SLURM_NPROCS=3
SLURM_JOB_NODELIST=node64-[24-25]
SLURM_JOB_CPUS_PER_NODE=2(x2)
SLURM_JOB_NUM_NODES=2

node64-test ~>mpirun --display-allocation hostname

======================   ALLOCATED NODES   ======================

 Data for node: Name: node64-test.xxxx.xxxx.xxxx.xxxx   Num slots: 0   
Max slots: 0
 Data for node: Name: node64-24 Num slots: 2    Max slots: 0
 Data for node: Name: node64-25 Num slots: 2    Max slots: 0

=================================================================
node64-24.xxxx.xxxx.xxxx.xxxx
node64-24.xxxx.xxxx.xxxx.xxxx
node64-25.xxxx.xxxx.xxxx.xxxx
node64-25.xxxx.xxxx.xxxx.xxxx


Thanks,
Matt

> Haven't seen that before on any of our machines.
>
> Could you do "printenv | grep SLURM" after the salloc and send the
> results?
>
> What version of SLURM is this?
>
> Please run "mpirun --display-allocation hostname" and send the results.
>
> Thanks
> Ralph
>
> On Mon, Aug 24, 2009 at 11:30 AM, <matthew.pi...@ndsu.edu> wrote:
>
>> Hello,
>>
>> I've seem to run into an interesting problem with openMPI. After
>> allocating 3 processors and confirming that the 3 processors are
>> allocated. mpirun on a simple mpitest program seems to run on 4
>> processors. We have 2 processors per node. I can repeat this case with
>> any
>> odd number of nodes, openMPI seems to take any remaining processors on
>> the
>> box. We are running openMPI v1.3.3. Here is an example of what happens:
>>
>> node64-test ~>salloc -n3
>> salloc: Granted job allocation 825
>>
>> node64-test ~>srun hostname
>> node64-28.xxxx.xxxx.xxxx.xxxx
>> node64-28.xxxx.xxxx.xxxx.xxxx
>> node64-29.xxxx.xxxx.xxxx.xxxx
>>
>> node64-test ~>MX_RCACHE=0
>> LD_LIBRARY_PATH="/hurd/mpi/openmpi/lib:/usr/local/mx/lib" mpirun
>> mpi_pgms/mpitest
>> MPI domain size: 4
>> I am rank 000 - node64-28.xxxx.xxxx.xxxx.xxxx
>> I am rank 003 - node64-29.xxxx.xxxx.xxxx.xxxx
>> I am rank 001 - node64-28.xxxx.xxxx.xxxx.xxxx
>> I am rank 002 - node64-29.xxxx.xxxx.xxxx.xxxx
>>
>>
>>
>> For those who may be curious here is the program:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <mpi.h>
>>
>> extern int main(int argc, char *argv[]);
>>
>> extern int main(int argc, char *argv[])
>>
>> {
>>        auto int rank,
>>                 size,
>>                 namelen;
>>
>>        MPI_Status status;
>>
>>        static char processor_name[MPI_MAX_PROCESSOR_NAME];
>>
>>        MPI_Init(&argc, &argv);
>>        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>        MPI_Comm_size(MPI_COMM_WORLD, &size);
>>
>>       if ( rank == 0 )
>>        {
>>                MPI_Get_processor_name(processor_name, &namelen);
>>                fprintf(stdout,"My name is: %s\n",processor_name);
>>                fprintf(stdout,"Cluster size is: %d\n", size);
>>
>>        }
>>        else
>>        {
>>                MPI_Get_processor_name(processor_name, &namelen);
>>                fprintf(stdout,"My name is: %s\n",processor_name);
>>        }
>>
>>        MPI_Finalize();
>>        return(0);
>> }
>>
>>
>> I'm curious if this is a bug in the way openMPI interprets SLURM
>> environment variables. If you have any ideas or need any more
>> information
>> let me know.
>>
>>
>> Thanks.
>> Matt
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to