> What's the best way to troubleshoot this when orted fails but doesn't give > any sort of error to indicate what the root cause of the failure might be? > And I also can't predictably induce the failure, just have to wait until it > randomly chokes.
You can try increasing the Open MPI verbosity — generally and module-specific. That's often how I am able to notice what's wrong under the hood with Open MPI. Use the `ompi_info` command to check for all "verbose" parameters: $ ompi_info --level 9 --all | grep _verbose
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com