Re: [slurm-users] Question about PMIX ERROR messages being emitted by some child of srun process
> So I’m testing the use of Open MPI 5.0.0 pre-release with the Slurm/PMIx setup > currently on NERSC Perlmutter system. > > The SLURM version on Perlmutter is currently 2023.02.2 > > The PMIx version that the admins used to build slurm against is pmix-4.2.3. > I’ve attached the output of pmix_info. > > My test application “works” but if I use srun, I get these types of messages: > > srun -n 2 -N 2 --mpi=pmix ./ring_c > > [cn316:2770176] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c > at > line 750 Hi, 23.02.2 contains PMIx permission regression, it may be worth to check if it's case? https://bugs.schedmd.com/show_bug.cgi?id=16687 commit 1f9386909230cd73506d88f02f75126924d3f41e Author: Danny Auble Date: Mon May 15 18:35:25 2023 +0200 mpi/pmix - fix PMIx shmem backed files permissions regression. Introduced in 23.02.2 commit d23cad68df. Bug 16687 BR, Tommi
Re: [slurm-users] Question about PMIX ERROR messages being emitted by some child of srun process
Hi Tommi, Howard, On 5/22/23 12:16 am, Tommi Tervo wrote: 23.02.2 contains PMIx permission regression, it may be worth to check if it's case? I confirmed I could replicate the UNPACK-INADEQUATE-SPACE messages Howard is seeing on a test system, so I tried that patch on that same system without any change. :-( Looking at the PMIx code base the messages appear to come from that code (the triggers are in src/mca/bfrops/) and I saw I could set PMIX_DEBUG=verbose to get more info on the problem, but when I set that these messages go away entirely. :-/ Very odd. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA