A few points: - Just to clarify: Open MPI and MPICH are entirely different code bases / entirely different MPI implementations. They both implement the same C and Fortran APIs that can be used by applications (i.e., they're *source code compatible*), but they are otherwise not compatible at all. Hence, you have to use entirely one MPI implementation or the other (e.g., use Open MPI or use MPICH -- don't use both at the same time).
--> That being said, you can build xhpl for Open MPI and rename the executable xhpl.openmpi, and then build xhpl again for MPICH and rename the executable xhpl.mpich, and then you can use the appropriate mpirun or mpiexec to launch the executable that you want to invoke (e.g., use Open MPI's mpirun to launch xhpl.openmpi and use MPICH's mpiexec to launch xhpl.mpich). - In Open MPI, mpirun and mpiexec are sym links to the same executable. Meaning: they're exactly equivalent. I don't know offhand if the same is true for MPICH -- I have a dim recollection that MPICH prefers "mpiexec" -- I don't know if they still have "mpirun". Check their docs. - ldd takes the absolute name of an executable. If "mpirun" or "mpiexe" is not in your current directory, you likely need to give its full path (which is why "ldd mpirun" failed; the error message indicates that there is no "mpirun" in the . directory). - The ldd of xhpl shows that it is linked against libmpich -- which is definitely an MPICH library, not an Open MPI library. - Hence, if you're using Open MPI's mpirun and an MPICH-compiled XHPL, this is why things are failing. You need to use a single MPI implementation's wrapper compilers and mpirun/mpiexec -- you can't build with one MPI implementation and then launch with the other. Open MPI and MPICH are not compatible in that way. > On May 27, 2015, at 12:47 PM, Heerdt, Lanze M. <heerdt...@gcc.edu> wrote: > > I ran<ldd and HPLdat.PNG><-tag-output and ldd.PNG> mpirun -machinefile ~/machinefile -np 4 -tag-output xhpl and just to be sure I ran the same thing with mpiexec (because I think I have it set up to use mpich and not openmpi, correct if I am wrong but the idea is the same?) and tried the ldd mpirun but that didn’t work at all In the second image I got some feedback from the ldd xhpl and also have my HPL.dat shown with p and q equal to 2. Like I said, running with that HPL.dat and mpiexec -machinefile ~/machinefile -n 4 xhpl it just gives me the same error Thank you for responding so quickly by the way :) you guys are a live saver. -Lanze From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Tuesday, May 26, 2015 10:08 PM To: Open MPI Users Subject: Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more First you can run mpirun -machinefile ~/machinefile -np 4 -tag-output xhpl if all tasks report they believe they are task 0, then this is the origin of the problem. then you can run ldd mpirun ldd xphl they should use the same mpi flavor then mpirun -machinefile ~/machinefile -np 4 -tag-output ldd xhpl and make sure xhpl use the very same mpi flavor all the nodes HPL make process can be error prone, especially if you modify some config file / arch in the middle. a simple option is to rebuild xhpl from scratch and with OpenMPI you can also post your HPL.dat and i will have a look Cheers, Gilles On 5/27/2015 10:38 AM, Heerdt, Lanze M. wrote: I have run a hello world program for any number of processes. If I say “–n 16” I get 4 responses from each node saying “Hello world! I am process (0-15) of 16 on RPI-0(1-4)” so I know the cluster Can work how I want it to. I also tested with just normal hostname and I see the names of each of the 4 Pis as a response. As a response to the illegal entry in HPL.dat, that doesn’t really make much sense since I run it just fine with p =1 and q =1, it only says that when I change p and q to 2, which I know is not an illegal entry From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Tuesday, May 26, 2015 8:14 PM To: Open MPI Users Subject: Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more At first glance, it seems all mpi tasks believe they are rank zero and comm world size is 1 (!) Did you compile xhpl with OpenMPI (and not a stub library for serial version only) ? can you make sure there is nothing wrong with your LD_LIBRARY_PATH and you do not mix MPI librairies (e.g. OpenMPI mpirun but xhpl ends up using mpich, or the other way around) As already suggested by Ralph, i would start by running a hello world program (just print rank and size to confirm it works) Cheers, Gilles On 5/27/2015 8:42 AM, Ralph Castain wrote: I don't know enough about HPL to resolve the problem. However, I would suggest that you first just try to run the example programs in the examples directory to ensure you have everything working. If they work, then the problem is clearly in the HPL arena. I do note that your image reports that you have an illegal entry in HPL.dat - if the examples work, you might start there. On Tue, May 26, 2015 at 12:26 PM, Heerdt, Lanze M. <heerdt...@gcc.edu> wrote: I realize this may be a bit off topic, but since what I am doing seems to be a pretty commonly done thing I am hoping to find someone who has done it before/can help since I’ve been at my wits end for so long they are calling me Mr. Whittaker. I am trying to run HPL on a Raspberry Pi cluster. I used the following guides to get to where I am now: http://www.tinkernut.com/2014/04/make-cluster-computer/ http://www.tinkernut.com/2014/05/make-cluster-computer-part-2/ https://www.howtoforge.com/tutorial/hpl-high-performance-linpack-benchmark-raspberry-pi/#comments and a bit of: https://www.raspberrypi.org/forums/viewtopic.php?p=301458#p301458 when the above guide wasn’t working basically when I run: “mpiexec -machinefile ~/machinefile -n 1 xhpl” it works just fine but when I run “mpiexec -machinefile ~/machinefile -n 4 xhpl” it errors with the attached image. (if I use “mpirun…” I get the exact same behavior) [Note: I HAVE changed the HPL.dat to have “2 Ps” and “2 Qs” from 1 and 1 for when I try to run it with 4 processes] This is for a project of mine which I need done by the end of the week so if you see this after 5/29 thank you but don’t bother responding I have hpl-2.1, mpi4py-1.3.1, mpich-3.1, and openmpi-1.8.5 at my disposal In the machinefile are the 4 IP addresses of my 4 RPi nodes 10.15.106.107 10.15.101.29 10.15.106.108 10.15.101.30 Any other information you need I can easily get to you so please do not hesitate to ask. I have nothing else to do but try and get this to work :P _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/05/26945.php _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/05/26948.php _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/05/26950.php <ldd and HPLdat.PNG><-tag-output and ldd.PNG>_______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/05/26956.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/