Hello everyone, I am having a hard time getting OpenMPI (1.1.2) to run in a heterogeneous environment. In short, here is my command line:
orterun --prefix ~/openmpi_x86_64/ -hostfile head -np 2 mandelbrot-mpi_x86_64 10000 400 400 0 : --prefix ~/openmpi_i686/ -hostfile nodes -np `wc -l<nodes ` mandelbrot-mpi_i686 10000 400 400 0 On execution, I get the followin error: bash: /export/home/eric/openmpi_x86_64/bin/orted: cannot execute binary file bash: /export/home/eric/openmpi_x86_64/bin/orted: cannot execute binary file [headless:06930] ERROR: A daemon on node thinkbig2 failed to start as expected. [headless:06930] ERROR: There may be more information available from [headless:06930] ERROR: the remote shell (see above). [headless:06930] ERROR: The daemon exited unexpectedly with status 126. [headless:06930] ERROR: A daemon on node thinkbig12 failed to start as expected. [headless:06930] ERROR: There may be more information available from [headless:06930] ERROR: the remote shell (see above). [headless:06930] ERROR: The daemon exited unexpectedly with status 126. After which I have to cancel the excution with CTRL-C. I am still trying to investigate this problem and I am coming up with the following. It would seem that orterun mixes the executables across the commands. For example, the follwoing command line should essentially return the contents of the host files "head" _and_ "nodes": First, the contents of the head and nodes files: eric@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ cat head headless slots=2 eric@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ cat nodes thinkbig12 thinkbig2 thinkbig3 thinkbig5 thinkbig6 thinkbig9 thinkbig10 thinkbig11 thinkbig4 thinkbig7 Second, the execution of the command: orterun --prefix ~/openmpi_x86_64/ -hostfile head -np 2 hostname : --prefix ~/openmpi_i686/ -hostfile nodes -np `wc -l<nodes ` hostname bash: /export/home/eric/openmpi_x86_64/bin/orted: cannot execute binary file bash: /export/home/eric/openmpi_x86_64/bin/orted: cannot execute binary file [headless:07196] ERROR: A daemon on node thinkbig2 failed to start as expected. [headless:07196] ERROR: There may be more information available from [headless:07196] ERROR: the remote shell (see above). [headless:07196] ERROR: The daemon exited unexpectedly with status 126. [headless:07196] ERROR: A daemon on node thinkbig12 failed to start as expected. [headless:07196] ERROR: There may be more information available from [headless:07196] ERROR: the remote shell (see above). [headless:07196] ERROR: The daemon exited unexpectedly with status 126. thinkbig10 thinkbig11 thinkbig4 thinkbig7 thinkbig6 thinkbig9 thinkbig5 thinkbig3 Now, if I remove the --prefix for the first par, I get the following: eric@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ orterun -hostfile head -np 2 hostname : --prefix ~/openmpi_i686/ -hostfile nodes -np `wc -l<nodes ` hostname thinkbig9 thinkbig2 thinkbig2 thinkbig12 thinkbig12 thinkbig4 thinkbig7 thinkbig10 thinkbig11 thinkbig5 thinkbig3 thinkbig6 Immediately, we notice that "hostname" is never runned on the "headless" node but runned twice on thinkbig2 and thinkbig12. This tells me that the first -hostfile is being ignored entirely and we fall into the round-robin schema. What am-I doing wrong? I would like to read up documentation on this but the manpage and web pages are very superficial on the subject of heterogeneous environments and I found no documentation on writing up an appfile as would be used with --app. Thanks, -- Eric Thibodeau