Rahul,
per the logs, it seems the /sys pseudo filesystem is not mounted in your
chroot.
at first, can you make sure this is mounted and try again ?
Cheers,
Gilles
On 5/26/2015 12:51 PM, Rahul Yadav wrote:
We were able to solve ssh problem.
But now MPI is not able to use component yalla. W
Well, it isn’t finding any MXM cards on NAE27 - do you have any there?
You can’t use yalla without MXM cards on all nodes
> On May 25, 2015, at 8:51 PM, Rahul Yadav wrote:
>
> We were able to solve ssh problem.
>
> But now MPI is not able to use component yalla. We are running following
> c
Yes Ralph, MXM cards are on the node. Command runs fine if I run it out of
the chroot environment.
Thanks
Rahul
On Mon, May 25, 2015 at 9:03 PM, Ralph Castain wrote:
> Well, it isn’t finding any MXM cards on NAE27 - do you have any there?
>
> You can’t use yalla without MXM cards on all nodes
>
btw, what is a rationale to run in chroot env? is it dockers-like env?
does "ibv_devinfo -v" works for you from chroot env?
On Tue, May 26, 2015 at 7:08 AM, Rahul Yadav wrote:
> Yes Ralph, MXM cards are on the node. Command runs fine if I run it out of
> the chroot environment.
>
> Thanks
> R
1. mxm_perf_test - OK.
2. no_tree_spawn - OK.
3. ompi yalla and "--mca pml cm --mca mtl mxm" still does not work (I use
prebuild ompi-1.8.5 from hpcx-v1.3.330)
3.a) host:$ $HPCX_MPI_DIR/bin/mpirun -x MXM_IB_PORTS=mlx4_0:1 -x
MXM_SHM_KCOPY_MODE=off -host node5,node153 --mca pml cm --mca mtl
It does not work for single node:
1) host: $ $HPCX_MPI_DIR/bin/mpirun -x MXM_IB_PORTS=mlx4_0:1 -x
MXM_SHM_KCOPY_MODE=off -host node5 -mca pml yalla -x MXM_TLS=ud,self,shm
--prefix $HPCX_MPI_DIR -mca plm_base_verbose 5 -mca oob_base_verbose 10 -mca
rml_base_verbose 10 --debug-daemons -np 1 .
You can also change the location of tmp files with the following mca option:
-mca orte_tmpdir_base /some/place
ompi_info --param all all -l 9 | grep tmp
MCA orte: parameter "orte_tmpdir_base" (current value: "", data
source: default, level: 9 dev/all, type: string)
I think we bumped up a default value in Open MPI 1.8.5. To go back to the old
64Mbyte value try running with:
--mca mpool_sm_min_size 67108864
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Aurélien Bouteiller
Sent: Tuesday, May 26, 2015 10:10 AM
To: Open MPI Users
Subject:
Hello Mike,
This particular instance of mxm was installed using rpms that were
re-rolled by our admins. I'm not 100% sure where they got them (HPCx or
somewhere else). I myself am not using HPCx. Is there any particular
reason why mxm shouldn't be in system space? If there is, I'll share it
w
Hello David,
Thanks for info and patch - will fix ompi configure logic with your patch.
mxm can be installed in the system and user spaces - both are valid and
supported logic.
M
On Tue, May 26, 2015 at 5:50 PM, David Shrader wrote:
> Hello Mike,
>
> This particular instance of mxm was instal
David,
Could you please send me your config.log file?
Looking into config/ompi_check_mxm.m4 macro I don`t understand how it could
happen.
Thanks a lot.
On Tue, May 26, 2015 at 6:41 PM, Mike Dubman
wrote:
> Hello David,
> Thanks for info and patch - will fix ompi configure logic with your patch
Hello Mike,
I'm glad that I could be of help.
Just as an FYI, right now our admins are still hosting the fca libraries
in /opt, but they would like to have it in system-space just as they
have done with mxm. I haven't worked my way through all of the
fca-related logic in configure yet, so I d
This line:
https://github.com/open-mpi/ompi/blob/master/config/ompi_check_mxm.m4#L41
doesn't check to see if $ompi_check_mxm_libdir is empty.
> On May 26, 2015, at 11:50 AM, Mike Dubman wrote:
>
> David,
> Could you please send me your config.log file?
>
> Looking into config/ompi_check_
Thanks Jeff!
but in this line:
https://github.com/open-mpi/ompi/blob/master/config/ompi_check_mxm.m4#L36
ompi_check_mxm_libdir gets value if with_mxm was passed
On Tue, May 26, 2015 at 6:59 PM, Jeff Squyres (jsquyres) wrote:
> This line:
>
>
> https://github.com/open-mpi/ompi/blob/master/co
Hello Mike,
I'm still working on getting you my config.log, but I thought I would
chime in about that line 36. In my case, that code path is not executed
because with_mxm is empty (I don't use --with-mxm on the configure line
since libmxm.so is in system space and configure picks up on it
aut
if just "./configure" was used - it can detect mxm only if it is installed
in /usr/include/...
by default mxm is installed in /opt/mellanox/mxm/...
I just checked with:
"./configure" and it did not detect mxm which is installed in the system
space
"./configure --with-mxm" and it did not detect
Mike --
I don't think that's right. If you just pass "--with-mxm", then $with_mxm will
equal "yes", and therefore neither of those two blocks of code are executed.
Hence, ompi_check_mxm_libdir will be empty.
Right?
> On May 26, 2015, at 1:28 PM, Mike Dubman wrote:
>
> Thanks Jeff!
>
> bu
I realize this may be a bit off topic, but since what I am doing seems to be a
pretty commonly done thing I am hoping to find someone who has done it
before/can help since I've been at my wits end for so long they are calling me
Mr. Whittaker.
I am trying to run HPL on a Raspberry Pi cluster. I
in that case, OPAL_CHECK_PACKAGE will disqualify mxm because it will not
find mxm_api.h header file in _OPAL_CHECK_PACKAGE_HEADER macro.
from
https://github.com/open-mpi/ompi/blob/master/config/ompi_check_mxm.m4#L43
from config.log generated after "./configure --with-mxm"
configure:263059: che
Unless the compiler can find the MXM headers/libraries without the --with-mxm
value. E.g.,:
./configure CPPFLAGS=-I/path/to/mxm/headers LDFLAGS=-L/path/to/mxm/libs
--with-mxm ...
(or otherwise sets the compiler/linker default search paths, etc.)
It seems like however it is happening, somehow
I don't know enough about HPL to resolve the problem. However, I would
suggest that you first just try to run the example programs in the examples
directory to ensure you have everything working. If they work, then the
problem is clearly in the HPL arena.
I do note that your image reports that you
At first glance, it seems all mpi tasks believe they are rank zero and
comm world size is 1 (!)
Did you compile xhpl with OpenMPI (and not a stub library for serial
version only) ?
can you make sure there is nothing wrong with your LD_LIBRARY_PATH and
you do not mix MPI librairies
(e.g. OpenM
I have run a hello world program for any number of processes. If I say "-n 16"
I get 4 responses from each node saying "Hello world! I am process (0-15) of 16
on RPI-0(1-4)" so I know the cluster Can work how I want it to. I also tested
with just normal hostname and I see the names of each of th
First you can run
mpirun -machinefile ~/machinefile -np 4 -tag-output xhpl
if all tasks report they believe they are task 0, then this is the
origin of the problem.
then you can run
ldd mpirun
ldd xphl
they should use the same mpi flavor
then
mpirun -machinefile ~/machinefile -np 4 -tag-outpu
I agree with Gilles -- when you compile with one MPI implementation, but then
accidentally use the mpirun/mpiexec from a different MPI implementation to
launch it, it's quite a common symptom to see an MPI_COMM_WORLD size of 1
(i.e., each MPI process is rank 0 in MPI_COMM_WORLD).
Make sure that
25 matches
Mail list logo