I'm in an airport right now and can't easily check, but instead of using mmap memory (which treats shared memory as a file), you could tell open MPI to use SYSV shared memory. IIRC that isn't treated like a file.
Look for a selection mechanism via an MCA param in the sm or Vader btls- run stuff like ompi_info --level 9 --param btl tcp And also w Vader. Looks for a param that selects which type of shared memory to use. Sent from my phone. No type good. On Nov 20, 2015, at 9:04 AM, Saurabh T <saur...@hotmail.com<mailto:saur...@hotmail.com>> wrote: > For what it's worth, that's open MPI creating a chunk of shared memory for > use with on-server > communication. It shows up as a "file", but it's really shared memory. > You can disable sm and/or Vader, but your on-server message passing > performance will be significantly > lower. > Is there a reason you have a file size limit? The file size limit is so our testing does not write runaway large files. I'm not satisfied that the solution would be to just throw a better error. This to me looks like a bug in openmpi. If it is actually shared memory, it shouldnt be constrained by file size limit. saurabh ________________________________ From: saur...@hotmail.com<mailto:saur...@hotmail.com> To: us...@open-mpi.org<mailto:us...@open-mpi.org> Subject: RE: Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072 List-Post: users@lists.open-mpi.org Date: Thu, 19 Nov 2015 17:32:36 -0500 I apologize, I have the wrong lines from strace for the initial file there (of course). The file with fd = 11 which causes the problem is called shared_mem_pool.[host] and fruncate(11, 134217736) is called on it. (This is exactly 1024 times the ulimit of 131072 which makes sense as the ulimit is in 1K blocks). ________________________________ From: saur...@hotmail.com<mailto:saur...@hotmail.com> To: us...@open-mpi.org<mailto:us...@open-mpi.org> Subject: RE: Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072 List-Post: users@lists.open-mpi.org Date: Thu, 19 Nov 2015 17:08:22 -0500 > Could you please provide a little more info regarding the environment you > are running under (which resource mgr or not, etc), how many nodes you had > in the allocation, etc? > There is no reason why something should behave that way. So it would help > if we could understand the setup. > Ralph To answer Ralph's above question on the other thread, all nodes are on the same machine orterun was run on. It's a redhat 7 64-bit gcc 4.8 install of openmpi 1.10.1. The only atypical thing is that btl_tcp_if_exclude = virbr0 has been added to openmpi-mca-params.conf based on some failures I was seeing before. (And now of course I've added btl = ^sm as well to fix this issue, see my other response). Relevant output from strace (without the btl = ^sm) is below. Stuff in square brackets are my minor edits and snips. open("/tmp/openmpi-sessions-[user]@[host]_0/40072/1/1/vader_segment.[host].1", O_RDWR|O_CREAT, 0600) = 12 ftruncate(12, 4194312) = 0 mmap(NULL, 4194312, PROT_READ|PROT_WRITE, MAP_SHARED, 12, 0) = 0x7fe506c8a000 close(12) = 0 write(9, "\1\0\0\0\0\0\0\0", 8) = 8 [...] poll([{fd=5, events=POLLIN}, {fd=11, events=POLLIN}], 2, 0) = -1 EFBIG (File too large) --- SIGXFSZ {si_signo=SIGXFSZ, si_code=SI_USER, si_pid=12329, si_uid=1005} --- -- ________________________________ From: saur...@hotmail.com<mailto:saur...@hotmail.com> To: us...@open-mpi.org<mailto:us...@open-mpi.org> Subject: Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072 List-Post: users@lists.open-mpi.org Date: Thu, 19 Nov 2015 15:24:08 -0500 Hi, Sorry my previous email was garbled, sending it again. > cd examples > make hello_cxx > ulimit -f 131073 > orterun -np 3 hello_cxx Hello, world (etc) > ulimit -f 131072 > orterun -np 3 hello_cxx -------------------------------------------------------------------------- orterun noticed that process rank 0 with PID 4473 on node sim16 exited on signal 25 (File size limit exceeded). -------------------------------------------------------------------------- Any thoughts? _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/11/28086.php