Hi,
I have installed openmpi-v3.x-201705250239-d5200ea on my "SUSE Linux
Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-7.1.0.
Unfortunately, my rankfiles don't work any longer.
loki rankfiles 136 cat rf_loki_nfs1
rank 0=loki slot=0:0-3;1:0-1
rank 1=loki slot=1:2-5
rank 2=nfs1 slot=0:4
rank 3=nfs1 slot=1:5
loki rankfiles 137 mpiexec -report-bindings -np 4 -rf rf_loki_nfs1 hostname
[nfs1:11461] [[41737,0],1] ORTE_ERROR_LOG: Not found in file
../../../../../openmpi-v3.x-201705250239-d5200ea/orte/mca/rmaps/rank_file/rmaps_rank_file.c
at line 408
[nfs1:11461] [[41737,0],1] ORTE_ERROR_LOG: Not found in file
../../../../../openmpi-v3.x-201705250239-d5200ea/orte/mca/rmaps/rank_file/rmaps_rank_file.c
at line 162
[nfs1:11461] [[41737,0],1] ORTE_ERROR_LOG: Not found in file
../../../../openmpi-v3.x-201705250239-d5200ea/orte/mca/rmaps/base/rmaps_base_map_job.c
at line 370
[nfs1:11461] [[41737,0],1] ORTE_ERROR_LOG: Not found in file
../../../../openmpi-v3.x-201705250239-d5200ea/orte/mca/odls/base/odls_base_default_fns.c
at line 425
--------------------------------------------------------------------------
ORTE has lost communication with a remote daemon.
HNP daemon : [[41737,0],0] on node loki
Remote daemon: [[41737,0],1] on node nfs1
This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--------------------------------------------------------------------------
loki rankfiles 138
I would be grateful, if somebody can fix the problem. Do you need anything
else? Thank you very much for any help in advance.
Kind regards
Siegmar
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users