What that parameter does is turn ³off² all of the transports except tcp ­ so
the problem you¹re seeing goes away because we no longer try to create the
shared memory file. This will somewhat hurt your performance, but it will
work.

Alternatively, you could use ³--mca btl ^sm², which would allow you to use
whatever high speed interconnects are on your system while still turning
³off² the shared memory file.

I¹m not sure why your tmp directory is getting its permissions wrong. It
sounds like there is something in your environment that is doing something
unexpected. You might just write a script and execute it that creates a file
and lists its permissions ­ would be interesting to see if the user or
access permissions are different than what you would normally expect.

Ralph


On 1/18/07 8:30 PM, "eddie168" <eddie168+ompi_u...@gmail.com> wrote:

> Just to answer my own question, after I explicitly specify the "--mca btl tcp"
> parameter, the program works. So I will need to issue command like this:
>  
> $ mpirun --mca btl tcp -np 2 tut01
> oceanus:Hello world from 0
> oceanus:Hello world from 1
>  
> Regards,
>  
> Eddie.
> 
>  
> On 1/18/07, eddie168 <eddie168+ompi_u...@gmail.com> wrote:
>> Hi Ralph and Brian,
>>  
>> Thanks for the advice, I have checked the permission to /tmp
>>  
>> drwxrwxrwt   19 root  root  4096 Jan 18 11:38 tmp
>>  
>> which I think there shouldn't be any problem to create files there, so option
>> (a) still not work for me.
>>  
>> I tried option (b) which set --tmpdir on command line and run as normal user,
>> it works for -np 1, however it gives the same error for -np 2.
>>  
>> Option (c) also tested by setting "OMPI_MCA_tmpdir_base = /home2/mpi_tut/tmp"
>> in "~/.openmpi/mca-params.conf", however error still occurred.
>>  
>> I included the debug output of what I ran (with IP masked), I noticed that
>> the optional tmp directory set in the beginning of the process, however it
>> changed back to "/tmp" after executing orted. Could the error I got related
>> to SSH setting? 
>>  
>> Many thanks,
>>  
>> Eddie.
>>  
>> 
>> 
>> [eddie@oceanus:~/home2/mpi_tut]$ mpirun -d --tmpdir /home2/mpi_tut/tmp -np 2
>> tut01
>> [oceanus:129119] [0,0,0] setting up session dir with
>> [oceanus:129119]        tmpdir /home2/mpi_tut/tmp
>> [oceanus:129119]        universe default-universe
>> [oceanus:129119]        user eddie
>> [oceanus:129119]        host oceanus
>> [oceanus:129119]        jobid 0
>> [oceanus:129119]        procid 0
>> [oceanus:129119] procdir:
>> /home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe/0/0
>> [oceanus:129119] jobdir:
>> /home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe/0
>> [oceanus:129119] unidir:
>> /home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe
>> [oceanus:129119] top: openmpi-sessions-eddie@oceanus_0
>> [oceanus:129119] tmp: /home2/mpi_tut/tmp
>> [oceanus:129119] [0,0,0] contact_file
>> /home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe/universe
>> -setup.txt 
>> [oceanus:129119] [0,0,0] wrote setup file
>> [oceanus:129119] pls:rsh: local csh: 0, local bash: 1
>> [oceanus:129119] pls:rsh: assuming same remote shell as local shell
>> [oceanus:129119] pls:rsh: remote csh: 0, remote bash: 1
>> [oceanus:129119] pls:rsh: final template argv:
>> [oceanus:129119] pls:rsh:     /usr/bin/ssh <template> orted --debug
>> --bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename
>> <template> --universe eddie@oceanus:default-universe --nsreplica
>> "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --gprreplica "
>> 0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 0
>> [oceanus:129119] pls:rsh: launching on node localhost
>> [oceanus:129119] pls:rsh: oversubscribed -- setting mpi_yield_when_idle to 1
>> (1 2)
>> [oceanus:129119] pls:rsh: localhost is a LOCAL node
>> [oceanus:129119] pls:rsh: changing to directory /home/eddie
>> [oceanus:129119] pls:rsh: executing: orted --debug --bootproxy 1 --name 0.0.1
>> --num_procs 2 --vpid_start 0 --nodename localhost --universe
>> eddie@oceanus:default-universe  <mailto:eddie@oceanus:default-universe>
>> --nsreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --gprreplica
>> "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 1
>> [oceanus:129120] [0,0,1] setting up session dir with
>> [oceanus:129120]        universe default-universe
>> [oceanus:129120]        user eddie
>> [oceanus:129120]        host localhost
>> [oceanus:129120]        jobid 0
>> [oceanus:129120]        procid 1
>> [oceanus:129120] procdir:
>> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/0/1
>> [oceanus:129120] jobdir:
>> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/0
>> [oceanus:129120] unidir:
>> /tmp/openmpi-sessions-eddie@localhost_0/default-universe
>> [oceanus:129120] top: openmpi-sessions-eddie@localhost_0
>> [oceanus:129120] tmp: /tmp
>> [oceanus:129121] [0,1,0] setting up session dir with
>> [oceanus:129121]        universe default-universe
>> [oceanus:129121]        user eddie
>> [oceanus:129121]        host localhost
>> [oceanus:129121]        jobid 1
>> [oceanus:129121]        procid 0
>> [oceanus:129121] procdir:
>> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/0
>> [oceanus:129121] jobdir:
>> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/1
>> [oceanus:129121] unidir:
>> /tmp/openmpi-sessions-eddie@localhost_0/default-universe
>> [oceanus:129121] top: openmpi-sessions-eddie@localhost_0
>> [oceanus:129121] tmp: /tmp
>> [oceanus:129122] [0,1,1] setting up session dir with
>> [oceanus:129122]        universe default-universe
>> [oceanus:129122]        user eddie
>> [oceanus:129122]        host localhost
>> [oceanus:129122]        jobid 1
>> [oceanus:129122]        procid 1
>> [oceanus:129122] procdir:
>> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/1
>> [oceanus:129122] jobdir:
>> /tmp/openmpi-sessions-eddie@localhost_0/default-universe/1
>> [oceanus:129122] unidir:
>> /tmp/openmpi-sessions-eddie@localhost_0/default-universe
>> [oceanus:129122] top: openmpi-sessions-eddie@localhost_0
>> [oceanus:129122] tmp: /tmp
>> [oceanus:129119] spawn: in job_state_callback(jobid = 1, state = 0x4)
>> [oceanus:129119] Info: Setting up debugger process table for applications
>>   MPIR_being_debugged = 0
>>   MPIR_debug_gate = 0
>>   MPIR_debug_state = 1
>>   MPIR_acquired_pre_main = 0
>>   MPIR_i_am_starter = 0
>>   MPIR_proctable_size = 2
>>   MPIR_proctable:
>>     (i, host, exe, pid) = (0, localhost, tut01, 129121)
>>     (i, host, exe, pid) = (1, localhost, tut01, 129122)
>> [oceanus:129121] mca_common_sm_mmap_init: ftruncate failed with errno=13
>> [oceanus:129121] mca_mpool_sm_init: unable to create shared memory mapping
>> (/tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/shared_mem_pool.l
>> ocalhost )
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>> 
>>   PML add procs failed
>>   --> Returned "Out of resource" (-2) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> [oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting
>> [oceanus:129120] sess_dir_finalize: job session dir not empty - leaving
>> [oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting
>> [oceanus:129120] sess_dir_finalize: found job session dir empty - deleting
>> [oceanus:129120] sess_dir_finalize: univ session dir not empty - leaving
>> [oceanus:129120] orted: job_state_callback(jobid = 1, state =
>> ORTE_PROC_STATE_TERMINATED)
>> [oceanus:129120] sess_dir_finalize: job session dir not empty - leaving
>> [oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting
>> [oceanus:129120] sess_dir_finalize: found job session dir empty - deleting
>> [oceanus:129120] sess_dir_finalize: found univ session dir empty - deleting
>> [oceanus:129120] sess_dir_finalize: found top session dir empty - deleting
>> [eddie@oceanus:~/home2/mpi_tut]$
>> 
>>  
>> On 1/18/07, Ralph H Castain <r...@lanl.gov> wrote:
>>> Hi Eddie
>>> 
>>> Open MPI needs to create a temporary file system ­ what we call our "session
>>> directory" - where it stores things like the shared memory file. From this
>>> output, it appears that your /tmp directory is "locked" to root access only.
>>> 
>>> You have three options for resolving this problem:
>>> 
>>> (a) you could make /tmp accessible to general users;
>>> 
>>> (b) you could use the ‹tmpdir xxx command line option to point Open MPI at
>>> another directory that is accessible to the user (for example, you could use
>>> a "tmp" directory under the user's home directory); or
>>> 
>>> (c) you could set an MCA parameter OMPI_MCA_tmpdir_base to identify a
>>> directory we can use instead of /tmp.
>>> 
>>>  If you select options (b) or (c), the only requirement is that this
>>> location must be accessible on every node being used. Let me be clear on
>>> this: the tmp directory must not be NSF mounted and therefore shared across
>>> all nodes. However, each node must be able to access a location of the given
>>> name ­ that location should be strictly local to each node.
>>> 
>>> Hope that helps
>>> Ralph 
>>> 
>>> 
>>> 
>>> 
>>> On 1/17/07 12:25 AM, "eddie168" < eddie168+ompi_u...@gmail.com
>>> <mailto:eddie168+ompi_u...@gmail.com> > wrote:
>>> 
>>>> Dear all,
>>>>  
>>>> I have recently installed OpenMPI 1.1.2 on a OpenSSI cluster running Fedora
>>>> core 3. I tested a simple hello world mpi program (attached) and it runs ok
>>>> as root. However, if I run the same program under normal user, it gives the
>>>> following error:
>>>>  
>>>> [eddie@oceanus:~/home2/mpi_tut]$ mpirun -np 2 tut01
>>>> [oceanus:125089] mca_common_sm_mmap_init: ftruncate failed with errno=13
>>>> [oceanus:125089] mca_mpool_sm_init: unable to create shared memory mapping
>>>> ( /tmp/openmpi-
>>>> sessions-eddie@localhost_0/default-universe/1/shared_mem_pool.localhost)
>>>> --------------------------------------------------------------------------
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort.  There are many reasons that a parallel process can
>>>> fail during MPI_INIT; some of which are due to configuration or environment
>>>> problems.  This failure appears to be an internal failure; here's some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>   PML add procs failed
>>>>   --> Returned "Out of resource" (-2) instead of "Success" (0)
>>>> --------------------------------------------------------------------------
>>>> *** An error occurred in MPI_Init
>>>> *** before MPI was initialized
>>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>> [eddie@oceanus:~/home2/mpi_tut]$
>>>> 
>>>> Am I need to give certain permission to the user in order to oversubscribe
>>>> processes?
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Eddie.
>>>> 
>>>>  
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to