[slurm-dev] Re: PMI2 related error

2015-08-21 Thread Artem Polyakov
Hi, Yes the fix for exactly that was submitted by me at end of 2014th: https://github.com/SchedMD/slurm/commit/7fff5eed6b8fe97347a832149966ed11f5805f99 You need to track if it was included to your version. 2015-08-21 17:31 GMT-04:00 Aaron Knister : > Hi Artem, > > Do you know if a fix for this

[slurm-dev] Re: PMI2 related error

2015-08-21 Thread Aaron Knister
Hi Artem, Do you know if a fix for this was ever committed? We ran into this with a code base that builds non-mpi apps with mpicc and then attempts to run then multiple times from within a single SLURM task. -Aaron On Wed, May 21, 2014 at 9:12 AM, Artem Polyakov wrote: > 2014-05-21 19:28 GMT+0

[slurm-dev] Re: PMI2 related error

2014-05-21 Thread Artem Polyakov
2014-05-21 19:28 GMT+07:00 Hongjia Cao : > > You debugging and analysis is correct. > > PMI2_init() initialize PMI in two steps. First a PMI 1.1 init command is > sent to the server and the version is negotiated with the server. After > that a PMI 2.0 fullinit command is sent. Everything goes well

[slurm-dev] Re: PMI2 related error

2014-05-21 Thread Artem Polyakov
2014-05-21 10:50 GMT+07:00 Artem Polyakov : > Here is an exact examples: > > 1. "appnum = -1" problem: > Program pmi_appnum.c (attached) is allocated using batch script > pmi_appnum.job (attached) and produces following results: > > PMI2_Init(0, 16, 0, -1) > PMI2_Init(0, 16, 1, -1) > PMI2_Init(0,

[slurm-dev] Re: PMI2 related error

2014-05-21 Thread Hongjia Cao
You debugging and analysis is correct. PMI2_init() initialize PMI in two steps. First a PMI 1.1 init command is sent to the server and the version is negotiated with the server. After that a PMI 2.0 fullinit command is sent. Everything goes well so far. But since the version number is decided, th

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Artem Polyakov
2014-05-21 12:18 GMT+07:00 Artem Polyakov : > Hello, Hongjia. > > 2014-05-21 12:11 GMT+07:00 Hongjia Cao : > > >> 在 2014-05-20二的 17:46 -0700,Artem Polyakov写道: >> > >> > >> > среда, 21 мая 2014 г. пользователь David Bigagli написал: >> > >> > The srun --mpi=pmi2 option has to be specified

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Artem Polyakov
2014-05-21 12:18 GMT+07:00 Hongjia Cao : > > I'd like to mention that the mpi/pmi2 plugin of SLURM also supports > PMI1.1. If no --mpi=pmi2 option given, the PMI implementation in SLURM > will be used, which supports PMI1.1 only. > That is correct. We think of this exactly that way. > > 在 2014-0

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Artem Polyakov
Hello, Hongjia. 2014-05-21 12:11 GMT+07:00 Hongjia Cao : > > 在 2014-05-20二的 17:46 -0700,Artem Polyakov写道: > > > > > > среда, 21 мая 2014 г. пользователь David Bigagli написал: > > > > The srun --mpi=pmi2 option has to be specified if openmpi was > > built with the --with-pmi opti

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Hongjia Cao
I'd like to mention that the mpi/pmi2 plugin of SLURM also supports PMI1.1. If no --mpi=pmi2 option given, the PMI implementation in SLURM will be used, which supports PMI1.1 only. 在 2014-05-20二的 22:04 -0700,Artem Polyakov写道: > Thank you, Chris! > > Currently we have a prototype that selects PMI

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Hongjia Cao
I will check the double init hang problem. 在 2014-05-20二的 20:52 -0700,Artem Polyakov写道: > Here is an exact examples: > > > 1. "appnum = -1" problem: > Program pmi_appnum.c (attached) is allocated using batch script > pmi_appnum.job (attached) and produces following results: > > > PMI2_Init(0,

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Hongjia Cao
在 2014-05-20二的 17:46 -0700,Artem Polyakov写道: > > > среда, 21 мая 2014 г. пользователь David Bigagli написал: > > The srun --mpi=pmi2 option has to be specified if openmpi was > built with the --with-pmi options otherwise Slurm will not > load the pmi2 plugins and

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Artem Polyakov
Thank you, Chris! Currently we have a prototype that selects PMI version based on: (a) user preference, in this case if PMI2 return error - we give up and MPI_Init fails. (b) automatic: here if we see the that PMI2 wasn't enabled (no --mpi=pmi2 option) we rollback to PMI1. This case also includes

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 21/05/14 10:47, Artem Polyakov wrote: > I need to add that I am working on PMI support in Open MPI, so I > go slightly deeper than regular user. It's also worth noting that because of problem reports about PMI2 with Slurm to the OMPI developers t

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Artem Polyakov
Here is an exact examples: 1. "appnum = -1" problem: Program pmi_appnum.c (attached) is allocated using batch script pmi_appnum.job (attached) and produces following results: PMI2_Init(0, 16, 0, -1) PMI2_Init(0, 16, 1, -1) PMI2_Init(0, 16, 2, -1) PMI2_Init(0, 16, 3, -1) PMI2_Init(0, 16, 5, -1) PM

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Artem Polyakov
среда, 21 мая 2014 г. пользователь Artem Polyakov написал: > > > среда, 21 мая 2014 г. пользователь David Bigagli написал: > >> >> The srun --mpi=pmi2 option has to be specified if openmpi was built with >> the --with-pmi options otherwise Slurm will not load the pmi2 plugins and >> the mpi job wi

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread Artem Polyakov
среда, 21 мая 2014 г. пользователь David Bigagli написал: > > The srun --mpi=pmi2 option has to be specified if openmpi was built with > the --with-pmi options otherwise Slurm will not load the pmi2 plugins and > the mpi job will fail in MPI_Init(). Thank you. I need to add that I am working on

[slurm-dev] Re: PMI2 related error

2014-05-20 Thread David Bigagli
The srun --mpi=pmi2 option has to be specified if openmpi was built with the --with-pmi options otherwise Slurm will not load the pmi2 plugins and the mpi job will fail in MPI_Init(). On 05/17/2014 07:50 PM, Artem Polyakov wrote: Hello, Here is some related notes that I found during further