Tim has proposed a clever fix that I had not thought of - just be aware that it could cause unexpected behavior at some point. Still, for what you are trying to do, that might meet your needs.
Ralph On 7/18/07 11:44 AM, "Tim Prins" <tpr...@open-mpi.org> wrote: > Adam C Powell IV wrote: >> As mentioned, I'm running in a chroot environment, so rsh and ssh won't >> work: "rsh localhost" will rsh into the primary local host environment, >> not the chroot, which will fail. >> >> [The purpose is to be able to build and test MPI programs in the Debian >> unstable distribution, without upgrading the whole machine to unstable. >> Though most machines I use for this purpose run Debian stable or >> testing, the machine I'm currently using runs a very old Fedora, for >> which I don't think OpenMPI is available.] > > Allright, I understand what you are trying to do now. To be honest, I > don't think we have ever really thought about this use case. We always > figured that to test Open MPI people would simply install it in a > different directory and use it from there. > >> >> With MPICH, mpirun -np 1 just runs the new process in the current >> context, without rsh/ssh, so it works in a chroot. Does OpenMPI not >> support this functionality? > > Open MPI does support this functionality. First, a bit of explanation: > > We use 'pls' (process launching system) components to handling the > launching of processes. There are components for slurm, gridengine, rsh, > and others. At runtime we open each of these components and query them > as to whether they can be used. The original error you posted says that > none of the 'pls' components can be used because all of they detected > they could not run in your setup. The slurm one excluded itself because > there were no environment variables set indicating it is running under > SLURM. Similarly, the gridengine pls said it cannot run as well. The > 'rsh' pls said it cannot run because neither 'ssh' nor 'rsh' are > available (I assume this is the case, though you did not explicitly say > they were not available). > > But in this case, you do want the 'rsh' pls to be used. It will > automatically fork any local processes, and will user rsh/ssh to launch > any remote processes. Again, I don't think we ever imagined the use case > on a UNIX-like system where there are no launchers like SLURM > available, and rsh/ssh also wasn't available (Open MPI is, after all, > primarily concerned with multi-node operation). > > So, there are several ways around this: > > 1. Make rsh or ssh available, even though they will not be used. > > 2. Tell the 'rsh' pls component to use a dummy program such as > /bin/false by adding the following to the command line: > -mca pls_rsh_agent /bin/false > > 3. Create a dummy 'rsh' executable that is available in your path. > > For instance: > > [tprins@odin ~]$ which ssh > /usr/bin/which: no ssh in > (/u/tprins/usr/ompia/bin:/u/tprins/usr/bin:/usr/local/bin:/bin:/usr/X11R6/bin) > [tprins@odin ~]$ which rsh > /usr/bin/which: no rsh in > (/u/tprins/usr/ompia/bin:/u/tprins/usr/bin:/usr/local/bin:/bin:/usr/X11R6/bin) > [tprins@odin ~]$ mpirun -np 1 hostname > [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file > runtime/orte_init_stage1.c at line 317 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_pls_base_select failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > > -------------------------------------------------------------------------- > [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file > runtime/orte_system_init.c at line 46 > [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 52 > [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file > orterun.c at line 399 > > [tprins@odin ~]$ mpirun -np 1 -mca pls_rsh_agent /bin/false hostname > odin.cs.indiana.edu > > [tprins@odin ~]$ touch usr/bin/rsh > [tprins@odin ~]$ chmod +x usr/bin/rsh > [tprins@odin ~]$ mpirun -np 1 hostname > odin.cs.indiana.edu > [tprins@odin ~]$ > > > I hope this helps, > > Tim > >> >> Thanks, >> Adam >> >> On Wed, 2007-07-18 at 11:09 -0400, Tim Prins wrote: >>> This is strange. I assume that you what to use rsh or ssh to launch the >>> processes? >>> >>> If you want to use ssh, does "which ssh" find ssh? Similarly, if you >>> want to use rsh, does "which rsh" find rsh? >>> >>> Thanks, >>> >>> Tim >>> >>> Adam C Powell IV wrote: >>>> On Wed, 2007-07-18 at 09:50 -0400, Tim Prins wrote: >>>>> Adam C Powell IV wrote: >>>>>> Greetings, >>>>>> >>>>>> I'm running the Debian package of OpenMPI in a chroot (with /proc >>>>>> mounted properly), and orte_init is failing as follows: >>>>>> [snip] >>>>>> What could be wrong? Does orterun not run in a chroot environment? >>>>>> What more can I do to investigate further? >>>>> Try running mpirun with the added options: >>>>> -mca orte_debug 1 -mca pls_base_verbose 20 >>>>> >>>>> Then send the output to the list. >>>> Thanks! Here's the output: >>>> >>>> $ orterun -mca orte_debug 1 -mca pls_base_verbose 20 -np 1 uptime >>>> [new-host-3:19201] mca: base: components_open: Looking for pls components >>>> [new-host-3:19201] mca: base: components_open: distilling pls components >>>> [new-host-3:19201] mca: base: components_open: accepting all pls components >>>> [new-host-3:19201] mca: base: components_open: opening pls components >>>> [new-host-3:19201] mca: base: components_open: found loaded component >>>> gridengine[new-host-3:19201] mca: base: components_open: component >>>> gridengine open function successful >>>> [new-host-3:19201] mca: base: components_open: found loaded component proxy >>>> [new-host-3:19201] mca: base: components_open: component proxy open >>>> function successful >>>> [new-host-3:19201] mca: base: components_open: found loaded component rsh >>>> [new-host-3:19201] mca: base: components_open: component rsh open function >>>> successful >>>> [new-host-3:19201] mca: base: components_open: found loaded component slurm >>>> [new-host-3:19201] mca: base: components_open: component slurm open >>>> function successful >>>> [new-host-3:19201] orte:base:select: querying component gridengine >>>> [new-host-3:19201] pls:gridengine: NOT available for selection >>>> [new-host-3:19201] orte:base:select: querying component proxy >>>> [new-host-3:19201] orte:base:select: querying component rsh >>>> [new-host-3:19201] orte:base:select: querying component slurm >>>> [new-host-3:19201] [0,0,0] ORTE_ERROR_LOG: Error in file >>>> runtime/orte_init_stage1.c at line 312 >>>> -------------------------------------------------------------------------- >>>> It looks like orte_init failed for some reason; your parallel process is >>>> likely to abort. There are many reasons that a parallel process can >>>> fail during orte_init; some of which are due to configuration or >>>> environment problems. This failure appears to be an internal failure; >>>> here's some additional information (which may only be relevant to an >>>> Open MPI developer): >>>> >>>> orte_pls_base_select failed >>>> --> Returned value -1 instead of ORTE_SUCCESS >>>> >>>> -------------------------------------------------------------------------- >>>> [new-host-3:19201] [0,0,0] ORTE_ERROR_LOG: Error in file >>>> runtime/orte_system_init.c at line 42 >>>> [new-host-3:19201] [0,0,0] ORTE_ERROR_LOG: Error in file >>>> runtime/orte_init.c at line 52 >>>> -------------------------------------------------------------------------- >>>> Open RTE was unable to initialize properly. The error occured while >>>> attempting to orte_init(). Returned value -1 instead of ORTE_SUCCESS. >>>> -------------------------------------------------------------------------- >>>> >>>> -Adam >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users