You can also change the location of tmp files with the following mca option: -mca orte_tmpdir_base /some/place
ompi_info --param all all -l 9 | grep tmp
MCA orte: parameter "orte_tmpdir_base" (current value: "", data
source: default, level: 9 dev/all, type: string)
MCA orte: parameter "orte_local_tmpdir_base" (current value:
"", data source: default, level: 9 dev/all, type: string)
MCA orte: parameter "orte_remote_tmpdir_base" (current value:
"", data source: default, level: 9 dev/all, type: string)
--
Aurélien Bouteiller ~~ https://icl.cs.utk.edu/~bouteill/
> Le 23 mai 2015 à 03:55, Gilles Gouaillardet <[email protected]> a
> écrit :
>
> Bill,
>
> the root cause is likely there is not enough free space in /tmp.
>
> the simplest, but slowest, option is to run mpirun --mac btl tcp ...
> if you cannot make enough space under /tmp (maybe you run diskless)
> there are some options to create these kind of files under /dev/shm
>
> Cheers,
>
> Gilles
>
>
> On Saturday, May 23, 2015, Lane, William <[email protected]
> <mailto:[email protected]>> wrote:
> I've compiled the linpack benchmark using openMPI 1.8.5 libraries
> and include files on CentOS 6.4.
>
> I've tested the binary on the one Intel node (some
> sort of 4-core Xeon) and it runs, but when I try to run it on any of
> the old Sunfire opteron compute nodes it appears to hang (although
> top indicates CPU and memory usage) and eventually terminates
> by itself. I'm also getting the following openMPI error messages/warnings:
>
> mpirun -np 16 --report-bindings --hostfile hostfile --prefix
> /hpc/apps/mpi/openmpi/1.8.5-dev --mca btl_tcp_if_include eth0 xhpl
>
> [cscld1-0-6:24370] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-3:24734] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-7:25152] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-4:18079] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-8:21443] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-2:19704] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-5:13481] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1-0-0:21884] create_and_attach: unable to create shared memory BTL
> coordinating structure :: size 134217728
> [cscld1:24240] 7 more processes have sent help message
> help-opal-shmem-mmap.txt / target full
>
> Note these errors also occur when I try to run the linpack benchmark on a
> single
> node as well.
>
> Does anyone know what's going on here? Google came up w/nothing and I have no
> idea what a BTL coordinating structure is.
>
> -Bill L.
>
> IMPORTANT WARNING: This message is intended for the use of the person or
> entity to which it is addressed and may contain information that is
> privileged and confidential, the disclosure of which is governed by
> applicable law. If the reader of this message is not the intended recipient,
> or the employee or agent responsible for delivering it to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this information is strictly prohibited. Thank you for your
> cooperation.
> _______________________________________________
> users mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/05/26907.php
> <http://www.open-mpi.org/community/lists/users/2015/05/26907.php>
signature.asc
Description: Message signed with OpenPGP using GPGMail
