On 7/18/07 9:49 AM, "Adam C Powell IV" <hazel...@debian.org> wrote:

> As mentioned, I'm running in a chroot environment, so rsh and ssh won't
> work: "rsh localhost" will rsh into the primary local host environment,
> not the chroot, which will fail.
> 
> [The purpose is to be able to build and test MPI programs in the Debian
> unstable distribution, without upgrading the whole machine to unstable.
> Though most machines I use for this purpose run Debian stable or
> testing, the machine I'm currently using runs a very old Fedora, for
> which I don't think OpenMPI is available.]
> 
> With MPICH, mpirun -np 1 just runs the new process in the current
> context, without rsh/ssh, so it works in a chroot.  Does OpenMPI not
> support this functionality?

Yes - and no. OpenMPI will launch on a local node without using rsh/ssh.
However, and it is a big however, our init code requires that we still
identify a working launcher that could be used to launch on remote nodes.
Frankly, we never considered the case you describe.

We could (and perhaps should) modify the code to allow it to continue even
if it doesn't find a viable launcher. I believe our initial thinking was
that something that launched only on the local node wasn't much use to MPI
and therefore that scenario probably represents an error condition.

We'll discuss it and see what we think should be done. Meantime, the answer
would have to be "no, we don't support that"

Ralph

> 
> Thanks,
> Adam
> 
> On Wed, 2007-07-18 at 11:09 -0400, Tim Prins wrote:
>> This is strange. I assume that you what to use rsh or ssh to launch the
>> processes?
>> 
>> If you want to use ssh, does "which ssh" find ssh? Similarly, if you
>> want to use rsh, does "which rsh" find rsh?
>> 
>> Thanks,
>> 
>> Tim
>> 
>> Adam C Powell IV wrote:
>>> On Wed, 2007-07-18 at 09:50 -0400, Tim Prins wrote:
>>>> Adam C Powell IV wrote:
>>>>> Greetings,
>>>>> 
>>>>> I'm running the Debian package of OpenMPI in a chroot (with /proc
>>>>> mounted properly), and orte_init is failing as follows:
>>>>> [snip]
>>>>> What could be wrong?  Does orterun not run in a chroot environment?
>>>>> What more can I do to investigate further?
>>>> Try running mpirun with the added options:
>>>> -mca orte_debug 1 -mca pls_base_verbose 20
>>>> 
>>>> Then send the output to the list.
>>> 
>>> Thanks!  Here's the output:
>>> 
>>> $ orterun -mca orte_debug 1 -mca pls_base_verbose 20 -np 1 uptime
>>> [new-host-3:19201] mca: base: components_open: Looking for pls components
>>> [new-host-3:19201] mca: base: components_open: distilling pls components
>>> [new-host-3:19201] mca: base: components_open: accepting all pls components
>>> [new-host-3:19201] mca: base: components_open: opening pls components
>>> [new-host-3:19201] mca: base: components_open: found loaded component
>>> gridengine[new-host-3:19201] mca: base: components_open: component
>>> gridengine open function successful
>>> [new-host-3:19201] mca: base: components_open: found loaded component proxy
>>> [new-host-3:19201] mca: base: components_open: component proxy open function
>>> successful
>>> [new-host-3:19201] mca: base: components_open: found loaded component rsh
>>> [new-host-3:19201] mca: base: components_open: component rsh open function
>>> successful
>>> [new-host-3:19201] mca: base: components_open: found loaded component slurm
>>> [new-host-3:19201] mca: base: components_open: component slurm open function
>>> successful
>>> [new-host-3:19201] orte:base:select: querying component gridengine
>>> [new-host-3:19201] pls:gridengine: NOT available for selection
>>> [new-host-3:19201] orte:base:select: querying component proxy
>>> [new-host-3:19201] orte:base:select: querying component rsh
>>> [new-host-3:19201] orte:base:select: querying component slurm
>>> [new-host-3:19201] [0,0,0] ORTE_ERROR_LOG: Error in file
>>> runtime/orte_init_stage1.c at line 312
>>> --------------------------------------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>>   orte_pls_base_select failed
>>>   --> Returned value -1 instead of ORTE_SUCCESS
>>> 
>>> --------------------------------------------------------------------------
>>> [new-host-3:19201] [0,0,0] ORTE_ERROR_LOG: Error in file
>>> runtime/orte_system_init.c at line 42
>>> [new-host-3:19201] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c
>>> at line 52
>>> --------------------------------------------------------------------------
>>> Open RTE was unable to initialize properly.  The error occured while
>>> attempting to orte_init().  Returned value -1 instead of ORTE_SUCCESS.
>>> --------------------------------------------------------------------------
>>> 
>>> -Adam


Reply via email to