On Fri, 2020-06-05 at 19:52 -0400, Stephen Siegel via users wrote:
> Sure, I’ll ask the machine admins to update and let you know how it
> goes.
> In the meantime, I was just wondering if someone has run this little
> program with an up-to-date OpenMPI and if it worked. If so, then I
> will know
Sure, I’ll ask the machine admins to update and let you know how it goes.
In the meantime, I was just wondering if someone has run this little program
with an up-to-date OpenMPI and if it worked. If so, then I will know the
problem is with our setup.
Thanks
-Steve
> On Jun 5, 2020, at 7:45 PM
You cited Open MPI v2.1.1. That's a pretty ancient version of Open MPI.
Any chance you can upgrade to Open MPI 4.0.x?
> On Jun 5, 2020, at 7:24 PM, Stephen Siegel wrote:
>
>
>
>> On Jun 5, 2020, at 6:55 PM, Jeff Squyres (jsquyres)
>> wrote:
>>
>> On Jun 5, 2020, at 6:35 PM, Stephen Sieg
> On Jun 5, 2020, at 6:55 PM, Jeff Squyres (jsquyres)
> wrote:
>
> On Jun 5, 2020, at 6:35 PM, Stephen Siegel via users
> wrote:
>>
>> [ilyich:12946] 3 more processes have sent help message help-mpi-btl-base.txt
>> / btl:no-nics
>> [ilyich:12946] Set MCA parameter "orte_base_help_aggregat
OK, but then on this other machine it hangs. This one is using SLURM, so I’m
not exactly sure but I think this tells me the OpenMPI version:
siegel@cisc372:~$ mpiexec.openmpi --version
mpiexec.openmpi (OpenRTE) 2.1.1
Report bugs to http://www.open-mpi.org/community/help/
siegel@cisc372:~/372
On Jun 5, 2020, at 6:35 PM, Stephen Siegel via users
wrote:
>
> [ilyich:12946] 3 more processes have sent help message help-mpi-btl-base.txt
> / btl:no-nics
> [ilyich:12946] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
> help / error messages
It looks like your output somehow
Your code looks correct, and based on your output I would actually suspect that
the I/O part finished correctly, the error message that you see is not an IO
error, but from the btl (which is communication related).
What version of Open MPI are using, and on what file system?
Thanks
Edgar
-