Il 26/01/2022 02:10, Jeff Squyres (jsquyres) via users ha scritto:
I'm afraid I don't know anything about Gadget, so I can't comment there. How
exactly does the application fail?
Neither did I :(
It fails saying a 'timestep' is 0, and that's usually caused by an error
in the parameters file.
You need a way for your process to exchange information so MPI_Init()
can work.
One option is to have your custom launcher implement a PMIx server
https://pmix.github.io
If you choose this path, you will likely want to use the Open PMIx
reference implementation
https://openpmix.github.io
Any pointers?
On Tue, Jan 25, 2022 at 12:55 PM Ralph Castain via users <
users@lists.open-mpi.org> wrote:
> Short answer is yes, but it it a bit complicated to do.
>
> On Jan 25, 2022, at 12:28 PM, Saliya Ekanayake via users <
> users@lists.open-mpi.org> wrote:
>
> Hi,
>
> I am trying to run an M
I'm afraid I don't know anything about Gadget, so I can't comment there. How
exactly does the application fail?
Can you try upgrading to Open MPI v4.1.2?
What networking are you using?
--
Jeff Squyres
jsquy...@cisco.com
From: users on behalf of Diego
Thanks a lot for feedback to you and Gilles. I'm completely new to this,
at least I know now what _should_ work. I'll look into the lmod part,
maybe I screwed something there, I'm a newbie there too...
Matthias
Am 25.01.22 um 18:17 schrieb Ralph Castain via users:
Never seen anything like tha
Short answer is yes, but it it a bit complicated to do.
On Jan 25, 2022, at 12:28 PM, Saliya Ekanayake via users
mailto:users@lists.open-mpi.org> > wrote:
Hi,
I am trying to run an MPI program on a platform that launches the processes
using a custom launcher (not mpiexec). This will end up spa
Hi,
I am trying to run an MPI program on a platform that launches the processes
using a custom launcher (not mpiexec). This will end up spawning N
processes of the program, but I am not sure if MPI_Init() would work or not
in this case?
Is it possible to have a group of processes launched by some
Never seen anything like that before - am I reading those errors correctly that
it cannot find the "write" function symbol in libc?? Frankly, if that's true
then it sounds like something is borked in the system.
> On Jan 25, 2022, at 8:26 AM, Matthias Leopold via users
> wrote:
>
> just in c
just in case anyone wants to do more debugging: I ran "srun --mpi=pmix"
now with "LD_DEBUG=all", the lines preceding the error are
1263345: symbol=write; lookup in
file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
1263345: binding file
/msc/sw/hpc-sdk/Linux_x86_64/21.9/comm_libs/mpi/lib/
Hello all.
A user of our cluster is experiencing a weird problem that I can't pinpoint.
He does have a job script that worked well on every node. I's based on
Gadget2.
Lately, *sometimes*, the same executable with the same parameters file
works, sometimes it fails. On the same node and submi
PMIx library version used by SLURM is 3.2.3
Am 25.01.22 um 11:04 schrieb Gilles Gouaillardet:
PMIx library version used by SLURM
Matthias,
Thanks for the clarifications.
Unfortunately, I cannot connect the dots and I must be missing something.
If I recap correctly:
- SLURM has builtin PMIx support
- Open MPI has builtin PMIx support
- srun explicitly requires PMIx (srun --mpi=pmix_v3 ...)
- and yet Open MPI issues an
Hi Gilles,
I'm indeed using srun, I didn't have luck using mpirun yet.
Are option 2 + 3 of your list really different things? As far as I
understood now I need "Open MPI with PMI support", THEN I can use srun
with PMIx. Right now using "srun --mpi=pmix(_v3)" gives the error
mentioned below.
13 matches
Mail list logo