What that parameter does is turn ³off² all of the transports except tcp so the problem you¹re seeing goes away because we no longer try to create the shared memory file. This will somewhat hurt your performance, but it will work.
Alternatively, you could use ³--mca btl ^sm², which would allow you to use whatever high speed interconnects are on your system while still turning ³off² the shared memory file. I¹m not sure why your tmp directory is getting its permissions wrong. It sounds like there is something in your environment that is doing something unexpected. You might just write a script and execute it that creates a file and lists its permissions would be interesting to see if the user or access permissions are different than what you would normally expect. Ralph On 1/18/07 8:30 PM, "eddie168" <eddie168+ompi_u...@gmail.com> wrote: > Just to answer my own question, after I explicitly specify the "--mca btl tcp" > parameter, the program works. So I will need to issue command like this: > > $ mpirun --mca btl tcp -np 2 tut01 > oceanus:Hello world from 0 > oceanus:Hello world from 1 > > Regards, > > Eddie. > > > On 1/18/07, eddie168 <eddie168+ompi_u...@gmail.com> wrote: >> Hi Ralph and Brian, >> >> Thanks for the advice, I have checked the permission to /tmp >> >> drwxrwxrwt 19 root root 4096 Jan 18 11:38 tmp >> >> which I think there shouldn't be any problem to create files there, so option >> (a) still not work for me. >> >> I tried option (b) which set --tmpdir on command line and run as normal user, >> it works for -np 1, however it gives the same error for -np 2. >> >> Option (c) also tested by setting "OMPI_MCA_tmpdir_base = /home2/mpi_tut/tmp" >> in "~/.openmpi/mca-params.conf", however error still occurred. >> >> I included the debug output of what I ran (with IP masked), I noticed that >> the optional tmp directory set in the beginning of the process, however it >> changed back to "/tmp" after executing orted. Could the error I got related >> to SSH setting? >> >> Many thanks, >> >> Eddie. >> >> >> >> [eddie@oceanus:~/home2/mpi_tut]$ mpirun -d --tmpdir /home2/mpi_tut/tmp -np 2 >> tut01 >> [oceanus:129119] [0,0,0] setting up session dir with >> [oceanus:129119] tmpdir /home2/mpi_tut/tmp >> [oceanus:129119] universe default-universe >> [oceanus:129119] user eddie >> [oceanus:129119] host oceanus >> [oceanus:129119] jobid 0 >> [oceanus:129119] procid 0 >> [oceanus:129119] procdir: >> /home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe/0/0 >> [oceanus:129119] jobdir: >> /home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe/0 >> [oceanus:129119] unidir: >> /home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe >> [oceanus:129119] top: openmpi-sessions-eddie@oceanus_0 >> [oceanus:129119] tmp: /home2/mpi_tut/tmp >> [oceanus:129119] [0,0,0] contact_file >> /home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe/universe >> -setup.txt >> [oceanus:129119] [0,0,0] wrote setup file >> [oceanus:129119] pls:rsh: local csh: 0, local bash: 1 >> [oceanus:129119] pls:rsh: assuming same remote shell as local shell >> [oceanus:129119] pls:rsh: remote csh: 0, remote bash: 1 >> [oceanus:129119] pls:rsh: final template argv: >> [oceanus:129119] pls:rsh: /usr/bin/ssh <template> orted --debug >> --bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename >> <template> --universe eddie@oceanus:default-universe --nsreplica >> "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --gprreplica " >> 0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 0 >> [oceanus:129119] pls:rsh: launching on node localhost >> [oceanus:129119] pls:rsh: oversubscribed -- setting mpi_yield_when_idle to 1 >> (1 2) >> [oceanus:129119] pls:rsh: localhost is a LOCAL node >> [oceanus:129119] pls:rsh: changing to directory /home/eddie >> [oceanus:129119] pls:rsh: executing: orted --debug --bootproxy 1 --name 0.0.1 >> --num_procs 2 --vpid_start 0 --nodename localhost --universe >> eddie@oceanus:default-universe <mailto:eddie@oceanus:default-universe> >> --nsreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --gprreplica >> "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 1 >> [oceanus:129120] [0,0,1] setting up session dir with >> [oceanus:129120] universe default-universe >> [oceanus:129120] user eddie >> [oceanus:129120] host localhost >> [oceanus:129120] jobid 0 >> [oceanus:129120] procid 1 >> [oceanus:129120] procdir: >> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/0/1 >> [oceanus:129120] jobdir: >> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/0 >> [oceanus:129120] unidir: >> /tmp/openmpi-sessions-eddie@localhost_0/default-universe >> [oceanus:129120] top: openmpi-sessions-eddie@localhost_0 >> [oceanus:129120] tmp: /tmp >> [oceanus:129121] [0,1,0] setting up session dir with >> [oceanus:129121] universe default-universe >> [oceanus:129121] user eddie >> [oceanus:129121] host localhost >> [oceanus:129121] jobid 1 >> [oceanus:129121] procid 0 >> [oceanus:129121] procdir: >> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/0 >> [oceanus:129121] jobdir: >> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/1 >> [oceanus:129121] unidir: >> /tmp/openmpi-sessions-eddie@localhost_0/default-universe >> [oceanus:129121] top: openmpi-sessions-eddie@localhost_0 >> [oceanus:129121] tmp: /tmp >> [oceanus:129122] [0,1,1] setting up session dir with >> [oceanus:129122] universe default-universe >> [oceanus:129122] user eddie >> [oceanus:129122] host localhost >> [oceanus:129122] jobid 1 >> [oceanus:129122] procid 1 >> [oceanus:129122] procdir: >> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/1 >> [oceanus:129122] jobdir: >> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/1 >> [oceanus:129122] unidir: >> /tmp/openmpi-sessions-eddie@localhost_0/default-universe >> [oceanus:129122] top: openmpi-sessions-eddie@localhost_0 >> [oceanus:129122] tmp: /tmp >> [oceanus:129119] spawn: in job_state_callback(jobid = 1, state = 0x4) >> [oceanus:129119] Info: Setting up debugger process table for applications >> MPIR_being_debugged = 0 >> MPIR_debug_gate = 0 >> MPIR_debug_state = 1 >> MPIR_acquired_pre_main = 0 >> MPIR_i_am_starter = 0 >> MPIR_proctable_size = 2 >> MPIR_proctable: >> (i, host, exe, pid) = (0, localhost, tut01, 129121) >> (i, host, exe, pid) = (1, localhost, tut01, 129122) >> [oceanus:129121] mca_common_sm_mmap_init: ftruncate failed with errno=13 >> [oceanus:129121] mca_mpool_sm_init: unable to create shared memory mapping >> (/tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/shared_mem_pool.l >> ocalhost ) >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> PML add procs failed >> --> Returned "Out of resource" (-2) instead of "Success" (0) >> -------------------------------------------------------------------------- >> *** An error occurred in MPI_Init >> *** before MPI was initialized >> *** MPI_ERRORS_ARE_FATAL (goodbye) >> [oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting >> [oceanus:129120] sess_dir_finalize: job session dir not empty - leaving >> [oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting >> [oceanus:129120] sess_dir_finalize: found job session dir empty - deleting >> [oceanus:129120] sess_dir_finalize: univ session dir not empty - leaving >> [oceanus:129120] orted: job_state_callback(jobid = 1, state = >> ORTE_PROC_STATE_TERMINATED) >> [oceanus:129120] sess_dir_finalize: job session dir not empty - leaving >> [oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting >> [oceanus:129120] sess_dir_finalize: found job session dir empty - deleting >> [oceanus:129120] sess_dir_finalize: found univ session dir empty - deleting >> [oceanus:129120] sess_dir_finalize: found top session dir empty - deleting >> [eddie@oceanus:~/home2/mpi_tut]$ >> >> >> On 1/18/07, Ralph H Castain <r...@lanl.gov> wrote: >>> Hi Eddie >>> >>> Open MPI needs to create a temporary file system what we call our "session >>> directory" - where it stores things like the shared memory file. From this >>> output, it appears that your /tmp directory is "locked" to root access only. >>> >>> You have three options for resolving this problem: >>> >>> (a) you could make /tmp accessible to general users; >>> >>> (b) you could use the tmpdir xxx command line option to point Open MPI at >>> another directory that is accessible to the user (for example, you could use >>> a "tmp" directory under the user's home directory); or >>> >>> (c) you could set an MCA parameter OMPI_MCA_tmpdir_base to identify a >>> directory we can use instead of /tmp. >>> >>> If you select options (b) or (c), the only requirement is that this >>> location must be accessible on every node being used. Let me be clear on >>> this: the tmp directory must not be NSF mounted and therefore shared across >>> all nodes. However, each node must be able to access a location of the given >>> name that location should be strictly local to each node. >>> >>> Hope that helps >>> Ralph >>> >>> >>> >>> >>> On 1/17/07 12:25 AM, "eddie168" < eddie168+ompi_u...@gmail.com >>> <mailto:eddie168+ompi_u...@gmail.com> > wrote: >>> >>>> Dear all, >>>> >>>> I have recently installed OpenMPI 1.1.2 on a OpenSSI cluster running Fedora >>>> core 3. I tested a simple hello world mpi program (attached) and it runs ok >>>> as root. However, if I run the same program under normal user, it gives the >>>> following error: >>>> >>>> [eddie@oceanus:~/home2/mpi_tut]$ mpirun -np 2 tut01 >>>> [oceanus:125089] mca_common_sm_mmap_init: ftruncate failed with errno=13 >>>> [oceanus:125089] mca_mpool_sm_init: unable to create shared memory mapping >>>> ( /tmp/openmpi- >>>> sessions-eddie@localhost_0/default-universe/1/shared_mem_pool.localhost) >>>> -------------------------------------------------------------------------- >>>> It looks like MPI_INIT failed for some reason; your parallel process is >>>> likely to abort. There are many reasons that a parallel process can >>>> fail during MPI_INIT; some of which are due to configuration or environment >>>> problems. This failure appears to be an internal failure; here's some >>>> additional information (which may only be relevant to an Open MPI >>>> developer): >>>> PML add procs failed >>>> --> Returned "Out of resource" (-2) instead of "Success" (0) >>>> -------------------------------------------------------------------------- >>>> *** An error occurred in MPI_Init >>>> *** before MPI was initialized >>>> *** MPI_ERRORS_ARE_FATAL (goodbye) >>>> [eddie@oceanus:~/home2/mpi_tut]$ >>>> >>>> Am I need to give certain permission to the user in order to oversubscribe >>>> processes? >>>> >>>> Thanks in advance, >>>> >>>> Eddie. >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users