I solved my problem. I uninstalled all the mpi software that were on the
computer reinstalled openmpi. It was still not working so I uninstalled it
again and reinstalled it again and it is working now. Apparently there was a
problem with the installation.
Thanks for the help.
Quentin
> On 4
Michael,
in this case, you can
mpirun --mca oob ^ud ...
in order to blacklist the oob/ud component.
an alternative is to add
oob = ^ud
in /.../etc/openmpi-mca-params.conf
If Open MPI is installed on a local filesystem, then this setting can
be node specific.
That being said, the error suggest
Noam,
you might also want to try
mpirun --mca btl tcp,self ...
to rule out btl (shared memory and/or infiniband) related issues.
Once you rebuild Open MPI with --enable-debug, I recommend you first
check the arguments of the MPI_Send() and MPI_Recv() functions and
make sure
- same communicato
Hi Nathan, Howard,
Thanks for the feedback. Yes, we do already have UCX compiled in to our OpenMPI
installations, but it’s disabled by default on our system because some users
were reporting problems with it previously. But I’m not sure what the status of
these are with OpenMPI 3.0, something f
Yes, you can do this by adding --enable-debug to OMPI configure (and make
sure your don't have the configure flag --with-platform=optimize).
George.
On Thu, Apr 5, 2018 at 4:20 PM, Noam Bernstein
wrote:
>
> On Apr 5, 2018, at 4:11 PM, George Bosilca wrote:
>
> I attach with gdb on the proce
> On Apr 5, 2018, at 4:11 PM, George Bosilca wrote:
>
> I attach with gdb on the processes and do a "call mca_pml_ob1_dump(comm, 1)".
> This allows the debugger to make a call our function, and output internal
> information about the library status.
Great. But I guess I need to recompile omp
I attach with gdb on the processes and do a "call mca_pml_ob1_dump(comm,
1)". This allows the debugger to make a call our function, and output
internal information about the library status.
George.
On Thu, Apr 5, 2018 at 4:03 PM, Noam Bernstein
wrote:
> On Apr 5, 2018, at 3:55 PM, George Bo
> On Apr 5, 2018, at 3:55 PM, George Bosilca wrote:
>
> Noam,
>
> The OB1 provide a mechanism to dump all pending communications in a
> particular communicator. To do this I usually call mca_pml_ob1_dump(comm, 1),
> with comm being the MPI_Comm and 1 being the verbose mode. I have no idea how
Noam,
The OB1 provide a mechanism to dump all pending communications in a
particular communicator. To do this I usually call mca_pml_ob1_dump(comm,
1), with comm being the MPI_Comm and 1 being the verbose mode. I have no
idea how you can find the pointer to the communicator out of your code, but
i
i'm trying to compile openmpi to support all of our interconnects,
psm/openib/mxm/etc
this works fine, openmpi finds all the libs, compiles and runs on each
of the respective machines
however, we don't install the libraries for everything everywhere
so when i run things like ompi_info and mpirun
Honestly, this is a configuration issue with the openib btl. There is no reason to keep
either eager RDMA nor is there a reason to pipeline RDMA. I haven't found an app where
either of these "features" helps you with infiniband. You have the right idea
with the parameter changes but Howard is
is the file I/O that you mentioned using MPI I/O for that? If yes, what
file system are you writing to?
Edgar
On 4/5/2018 10:15 AM, Noam Bernstein wrote:
On Apr 5, 2018, at 11:03 AM, Reuti wrote:
Hi,
Am 05.04.2018 um 16:16 schrieb Noam Bernstein :
Hi all - I have a code that uses MPI (va
Hello Ben,
Thanks for the info. You would probably be better off installing UCX on
your cluster and rebuilding your Open MPI with the
--with-ucx
configure option.
Here's what I'm seeing with Open MPI 3.0.1 on a ConnectX5 based cluster
using ob1/openib BTL:
mpirun -map-by ppr:1:node -np 2 ./osu
> On Apr 5, 2018, at 11:32 AM, Edgar Gabriel wrote:
>
> is the file I/O that you mentioned using MPI I/O for that? If yes, what file
> system are you writing to?
No MPI I/O. Just MPI calls to gather the data, and plain Fortran I/O on the
head node only.
I should also say that in lots of ot
> On Apr 5, 2018, at 11:03 AM, Reuti wrote:
>
> Hi,
>
>> Am 05.04.2018 um 16:16 schrieb Noam Bernstein :
>>
>> Hi all - I have a code that uses MPI (vasp), and it’s hanging in a strange
>> way. Basically, there’s a Cartesian communicator, 4x16 (64 processes
>> total), and despite the fact th
Hi,
> Am 05.04.2018 um 16:16 schrieb Noam Bernstein :
>
> Hi all - I have a code that uses MPI (vasp), and it’s hanging in a strange
> way. Basically, there’s a Cartesian communicator, 4x16 (64 processes total),
> and despite the fact that the communication pattern is rather regular, one
> pa
Hi all - I have a code that uses MPI (vasp), and it’s hanging in a strange way.
Basically, there’s a Cartesian communicator, 4x16 (64 processes total), and
despite the fact that the communication pattern is rather regular, one
particular send/recv pair hangs consistently. Basically, across eac
Hi,
Another interesting point. I noticed that the last two message sizes tested
(2MB and 4MB) are lower than expected for both osu_bw and osu_bibw. Increasing
the minimum size to use the RDMA pipeline to above these sizes brings those two
data-points up to scratch for both benchmarks:
3.0.0, o
Hi,
We’ve just been running some OSU benchmarks with OpenMPI 3.0.0 and noticed that
osu_bibw gives nowhere near the bandwidth I’d expect (this is on FDR IB).
However, osu_bw is fine.
If I disable eager RDMA, then osu_bibw gives the expected numbers. Similarly,
if I increase the number of eager
19 matches
Mail list logo