Keep the /etc/password, group synced to all the nodes should work. And it will need to set up an SSH key for MPI.
Best, Feng On Mon, Feb 10, 2025 at 10:29 PM mark.w.moorcroft--- via slurm-users < slurm-users@lists.schedmd.com> wrote: > If you set up slurm elastic cloud in EC2 without LDAP, what is the > recommended method for sync of the passwd/group files? Is this necessary to > get openmpi jobs to run. I would swear I had this working last week without > synced passwd on two nodes. But thinking about it now I'm not sure how this > could have worked. My home directories are in an NFS mount, but the user > accounts don't exist on the node AMI. I'm using ansible/packer to manage > the AMI's. When I ran OpenHPC / Slurm on bare metal there was a sync > process. This is my first AWS Slurm cluster rodeo. I can't use the Amazon > Parallel Computing tools because we are forced to be in GovCloud. I started > with "ClusterInTheCloud", but it's all 4 years old, and semi-broken out of > the box. My manager had me ditch a lot of it (including LDAP). So I'm > building out a fork that is getting heavily modded for our situation. > > An ORTE daemon has unexpectedly failed after launch and before > communicating back to mpirun. This could be caused by a number > of factors, including an inability to create a connection back > to mpirun due to a lack of common network interfaces and/or no > route found between them. Please check network connectivity > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com