A few points:

- Just to clarify: Open MPI and MPICH are entirely different code bases / 
entirely different MPI implementations.  They both implement the same C and 
Fortran APIs that can be used by applications (i.e., they're *source code 
compatible*), but they are otherwise not compatible at all.  Hence, you have to 
use entirely one MPI implementation or the other (e.g., use Open MPI or use 
MPICH -- don't use both at the same time).

--> That being said, you can build xhpl for Open MPI and rename the executable 
xhpl.openmpi, and then build xhpl again for MPICH and rename the executable 
xhpl.mpich, and then you can use the appropriate mpirun or mpiexec to launch 
the executable that you want to invoke (e.g., use Open MPI's mpirun to launch 
xhpl.openmpi and use MPICH's mpiexec to launch xhpl.mpich).

- In Open MPI, mpirun and mpiexec are sym links to the same executable.  
Meaning: they're exactly equivalent.  I don't know offhand if the same is true 
for MPICH -- I have a dim recollection that MPICH prefers "mpiexec" -- I don't 
know if they still have "mpirun".  Check their docs.

- ldd takes the absolute name of an executable.  If "mpirun" or "mpiexe" is not 
in your current directory, you likely need to give its full path (which is why 
"ldd mpirun" failed; the error message indicates that there is no "mpirun" in 
the . directory).

- The ldd of xhpl shows that it is linked against libmpich -- which is 
definitely an MPICH library, not an Open MPI library.

- Hence, if you're using Open MPI's mpirun and an MPICH-compiled XHPL, this is 
why things are failing.  You need to use a single MPI implementation's wrapper 
compilers and mpirun/mpiexec -- you can't build with one MPI implementation and 
then launch with the other.  Open MPI and MPICH are not compatible in that way.



> On May 27, 2015, at 12:47 PM, Heerdt, Lanze M. <heerdt...@gcc.edu> wrote:
> 
> I ran<ldd and HPLdat.PNG><-tag-output and ldd.PNG>
mpirun -machinefile ~/machinefile -np 4 -tag-output xhpl and just to be sure I 
ran the same thing with mpiexec (because I think I have it set up to use mpich 
and not openmpi, correct if I am wrong but the idea is the same?) and tried the 
ldd mpirun but that didn’t work at all

 

In the second image I got some feedback from the ldd xhpl and also have my 
HPL.dat shown with p and q equal to 2. Like I said, running with that HPL.dat 
and

mpiexec -machinefile ~/machinefile -n 4 xhpl

it just gives me the same error

 

Thank you for responding so quickly by the way :) you guys are a live saver.

 

-Lanze

 

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Tuesday, May 26, 2015 10:08 PM
To: Open MPI Users
Subject: Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow 
not configured properly since it work with 1 node but not more

 

First you can run
mpirun -machinefile ~/machinefile -np 4 -tag-output xhpl

if all tasks report they believe they are task 0, then this is the origin of 
the problem.

then you can run
ldd mpirun
ldd xphl
they should use the same mpi flavor

then
mpirun -machinefile ~/machinefile -np 4 -tag-output ldd xhpl

and make sure xhpl use the very same mpi flavor all the nodes


HPL make process can be error prone, especially if you modify some config file 
/ arch in the middle.
a simple option is to rebuild xhpl from scratch and with OpenMPI

you can also post your HPL.dat and i will have a look

Cheers,

Gilles

On 5/27/2015 10:38 AM, Heerdt, Lanze M. wrote:

I have run a hello world program for any number of processes. If I say “–n 16” 
I get 4 responses from each node saying “Hello world! I am process (0-15) of 16 
on RPI-0(1-4)” so I know the cluster Can work how I want it to. I also tested 
with just normal hostname and I see the names of each of the 4 Pis as a 
response.

 

As a response to the illegal entry in HPL.dat, that doesn’t really make much 
sense since I run it just fine with p =1 and q =1, it only says that when I 
change p and q to 2, which I know is not an illegal entry

 

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Tuesday, May 26, 2015 8:14 PM
To: Open MPI Users
Subject: Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow 
not configured properly since it work with 1 node but not more

 

At first glance, it seems all mpi tasks believe they are rank zero and comm 
world size is 1 (!)

Did you compile xhpl with OpenMPI (and not a stub library for serial version 
only) ?
can you make sure there is nothing wrong with your LD_LIBRARY_PATH and you do 
not mix MPI librairies
(e.g. OpenMPI mpirun but xhpl ends up using mpich, or the other way around)

As already suggested by Ralph, i would start by running a hello world program
(just print rank and size to confirm it works)

Cheers,

Gilles



On 5/27/2015 8:42 AM, Ralph Castain wrote:

I don't know enough about HPL to resolve the problem. However, I would suggest 
that you first just try to run the example programs in the examples directory 
to ensure you have everything working. If they work, then the problem is 
clearly in the HPL arena.

 

I do note that your image reports that you have an illegal entry in HPL.dat - 
if the examples work, you might start there.

 

 

On Tue, May 26, 2015 at 12:26 PM, Heerdt, Lanze M. <heerdt...@gcc.edu> wrote:

I realize this may be a bit off topic, but since what I am doing seems to be a 
pretty commonly done thing I am hoping to find someone who has done it 
before/can help since I’ve been at my wits end for so long they are calling me 
Mr. Whittaker.

 

I am trying to run HPL on a Raspberry Pi cluster. I used the following guides 
to get to where I am now:

http://www.tinkernut.com/2014/04/make-cluster-computer/

http://www.tinkernut.com/2014/05/make-cluster-computer-part-2/

https://www.howtoforge.com/tutorial/hpl-high-performance-linpack-benchmark-raspberry-pi/#comments

and a bit of: https://www.raspberrypi.org/forums/viewtopic.php?p=301458#p301458 
when the above guide wasn’t working

 

basically when I run: “mpiexec -machinefile ~/machinefile -n 1 xhpl” it works 
just fine

but when I run “mpiexec -machinefile ~/machinefile -n 4 xhpl” it errors with 
the attached image. (if I use “mpirun…” I get the exact same behavior)

[Note: I HAVE changed the HPL.dat to have “2    Ps” and “2    Qs” from 1 and 1 
for when I try to run it with 4 processes]

 

This is for a project of mine which I need done by the end of the week so if 
you see this after 5/29 thank you but don’t bother responding

 

I have hpl-2.1, mpi4py-1.3.1, mpich-3.1, and openmpi-1.8.5 at my disposal

In the machinefile are the 4 IP addresses of my 4 RPi nodes

10.15.106.107

10.15.101.29

10.15.106.108

10.15.101.30

 

Any other information you need I can easily get to you so please do not 
hesitate to ask. I have nothing else to do but try and get this to work :P


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26945.php

 






_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26948.php
 





_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26950.php
 

<ldd and HPLdat.PNG><-tag-output and 
ldd.PNG>_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26956.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to