Thanks - "--map-by numa:span" did exactly what I wanted!
On Wed, May 15, 2019 at 10:34 PM Ralph Castain via users <
users@lists.open-mpi.org> wrote:
>
>
> > On May 15, 2019, at 7:18 PM, Adam Sylvester via users <
> users@lists.open-mpi.org> wrote:
> >
Up to this point, I've been running a single MPI rank per physical host
(using multithreading within my application to use all available cores). I
use this command:
mpirun -N 1 --bind-to none --hostfile hosts.txt
Where hosts.txt has an IP address on each line
I've started running on machines with
e is a link to Brian’s video
> https://insidehpc.com/2018/04/amazon-libfabric-case-study-flexible-hpc-infrastructure/
>
> Cheers,
>
> Gilles
>
> On Sunday, March 24, 2019, Adam Sylvester wrote:
>
>> Digging up this old thread as it appears there's still an issue
am running with:
mpirun --mca btl_tcp_links 4 -N 1 --bind-to none --hostfile hosts.txt
/path/to/my/application
Trying a btl_tcp_links value of 2 or 3 also makes no difference. Is there
another flag I need to be using or is something still broken?
Thanks.
-Adam
On Thu, Jul 13, 2017 at 12:05 PM
). Future UCX versions will fix a lingering bug that makes this
required currently.
With these changes, I was able to successfully run my application.
On Sun, Mar 3, 2019 at 9:49 AM Adam Sylvester wrote:
> I'm running OpenMPI 4.0.0 built with gdrcopy 1.3 and UCX 1.4 per the
> inst
I'm running OpenMPI 4.0.0 built with gdrcopy 1.3 and UCX 1.4 per the
instructions at https://www.open-mpi.org/faq/?category=buildcuda, built
against CUDA 10.0 on RHEL 7. I'm running on a p2.xlarge instance in AWS
(single NVIDIA K80 GPU). OpenMPI reports CUDA support:
$ ompi_info --parsable --all
ith the equivalent Send+Recv
> followed by Broadcast. I don't think MPI_Allgatherv is particularly
> optimized (since it is hard to do and not a very popular function) and it
> might improve your memory utilization.
> >
> > Jeff
> >
> > On Thu, Dec 20, 2018 at
ample if you run
> MPI_Scatter(root=0) in a loop)
>
> Cheers,
>
> Gilles
>
> On Thu, Dec 20, 2018 at 11:06 PM Adam Sylvester wrote:
> >
> > This case is actually quite small - 10 physical machines with 18
> physical cores each, 1 rank per machine. These are AWS R
ote:
> How many nodes are you using? How many processes per node? What kind of
> processor? Open MPI version? 25 GB is several orders of magnitude more
> memory than should be used except at extreme scale (1M+ processes). Also,
> how are you calculating memory usage?
>
> -Nath
Is there a way at runtime to query OpenMPI to ask it how much memory it's
using for internal buffers? Is there a way at runtime to set a max amount
of memory OpenMPI will use for these buffers? I have an application where
for certain inputs OpenMPI appears to be allocating ~25 GB and I'm not
acco
I'm running OpenMPI 2.1.0 on RHEL 7 using TCP communication. For the
specific run that's crashing on me, I'm running with 17 ranks (on 17
different physical machines). I've got a stage in my application where
ranks need to transfer chunks of data where the size of each chunk is
trivial (on the or
ther with the port min, define a range of
> ports
> where Open MPI will open sockets
>
> btl_tcp_port_min_v4: starting port to use
>
> I can’t answer the question about #ports to open - will have to leave that
> to someone else
> Ralph
>
> > On Jul 7, 2
I'm using OpenMPI 2.1.0 on RHEL 7, communicating between ranks via TCP
I have a new cluster to install my application on with tightly-controlled
firewalls. I can have them open up a range of TCP ports which MPI can
communicate over. I thought I could force MPI to stick to a range of ports
via "-
rwise mpirun would fork&exec a large
> number of ssh processes and hence use quite a lot of
> resources on the node running mpirun.
>
> Cheers,
>
> Gilles
>
> On Tue, Feb 13, 2018 at 8:23 AM, Adam Sylvester wrote:
> > I'm running OpenMPI 2.1.0, built from so
I'm running OpenMPI 2.1.0, built from source, on RHEL 7. I'm using the
default ssh-based launcher, where I have my private ssh key on rank 0 and
the associated public key on all ranks. I create a hosts file with a list
of unique IPs, with the host that I'm running mpirun from on the first
line, a
I have an application running across 20 machines where each machine has 60
GB RAM. For some large inputs, some ranks require 45-50 GB RAM. The
behavior I'm seeing is that for some of these large cases, my application
will run for 10-15 minutes and then one rank will be killed; based on
watching t
Thanks - that's exactly what I needed! Works as advertised. :o)
On Thu, Nov 16, 2017 at 1:27 PM, Aurelien Bouteiller
wrote:
> Adam. Your MPI program is incorrect. You need to replace the finalize on
> the process that found the error with MPIAbort
>
> On Nov 16, 2017 10:38
I'm using Open MPI 2.1.0 for this but I'm not sure if this is more of an
Open MPI-specific implementation question or what the MPI standard
guarantees.
I have an application which runs across multiple ranks, eventually reaching
an MPI_Gather() call. Along the way, if one of the ranks encounters a
I'll start with my question upfront: Is there a way to do the equivalent of
telling mpirun to do 'numactl --interleave=all' on the processes that it
runs? Or if I want to control the memory placement of my applications run
through MPI will I need to use libnuma for this? I tried doing "mpirun
nu
t applications (outside of benchmarks) don’t
> benefit from the 20 Gbps between rank pairs, as they are generally talking
> to multiple peers at once (and therefore can drive the full 20 Gbps). It’s
> definitely on our roadmap, but can’t promise a release just yet.
>
> Brian
>
> On
environment
>>
>> export OMPI_MCA_btl_tcp_sndbuf=0
>>
>> export OMPI_MCA_btl_tcp_rcvbuf=0
>>
>>
>> - use Open MPI 2.0.3
>>
>>
>> - last but not least, you can manually download and apply the patch
>> available at
>>
>>
ailable at
>
> https://github.com/open-mpi/ompi/commit/b64fedf4f652cadc9bfc
> 7c4693f9c1ef01dfb69f.patch
>
>
> Cheers,
>
> Gilles
>
> On 7/9/2017 11:04 PM, Adam Sylvester wrote:
>
>> Gilles,
>>
>> Thanks for the fast response!
>>
>> The
ne socket on the fast interface.
> for example, if you want to use 4 sockets per interface
> mpirun --mca btl_tcp_links 4 ...
>
>
>
> Cheers,
>
> Gilles
>
> On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester wrote:
> > I am using Open MPI 2.1.0 on RHEL 7. My app
I am using Open MPI 2.1.0 on RHEL 7. My application has one unavoidable
pinch point where a large amount of data needs to be transferred (about 8
GB of data needs to be both sent to and received all other ranks), and I'm
seeing worse performance than I would expect; this step has a major impact
on
planned for release in the near future
>
> On Mar 19, 2017, at 1:40 PM, Adam Sylvester wrote:
>
> I did a little more testing in case this helps... if I run ompi-server on
> the same host as the one I call MPI_Publish_name() on, it does successfully
> connect. But when I run it on a
in the near future
>
> On Mar 14, 2017, at 6:26 PM, Adam Sylvester wrote:
>
> Excellent - I appreciate the quick turnaround.
>
> On Tue, Mar 14, 2017 at 10:24 AM, r...@open-mpi.org
> wrote:
>
>> I don’t see an issue right away, though I know it has been brought up
&g
>
> On Mar 19, 2017, at 4:37 AM, Adam Sylvester wrote:
>
> I am trying to use ompi-server with Open MPI 1.10.6. I'm wondering if I
> should run this with or without the mpirun command. If I run this:
>
> ompi-server --no-daemonize -r +
>
> It prints something such
I am trying to use ompi-server with Open MPI 1.10.6. I'm wondering if I
should run this with or without the mpirun command. If I run this:
ompi-server --no-daemonize -r +
It prints something such as 959315968.0;tcp://172.31.3.57:45743 to stdout
but I have thus far been unable to connect to it.
hen ready.
>
>
> On Mar 13, 2017, at 6:16 PM, Adam Sylvester wrote:
>
> Bummer - thanks for the update. I will revert back to 1.10.x for now
> then. Should I file a bug report for this on GitHub or elsewhere? Or if
> there's an issue for this already open, can you point me
On Mar 13, 2017, at 5:17 AM, Adam Sylvester wrote:
>
> As a follow-up, I tried this with Open MPI 1.10.4 and this worked as
> expected (the port formatting looks really different):
>
> $ mpirun -np 1 ./server
> Port name is 1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://
Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester wrote:
> I'm using Open MPI 2.0.2 on RHEL 7. I'm trying to use MPI_Open_port() /
> MPI_Comm_accept() / MPI_Conn_connect(). My use case is that I'll have two
> processes running on two machines that don't initially know about
I'm using Open MPI 2.0.2 on RHEL 7. I'm trying to use MPI_Open_port() /
MPI_Comm_accept() / MPI_Conn_connect(). My use case is that I'll have two
processes running on two machines that don't initially know about each
other (i.e. I can't do the typical mpirun with a list of IPs); eventually I
thin
> and port 22 (ssh).
>
> you can also refer to https://github.com/open-mpi/ompi/issues/1511
> yet an other way to use docker was discussed here.
>
> last but not least, if you want to use containers but you are not tied to
> docker, you can consider http://singularity.l
I'm trying to use OpenMPI 1.10.4 to communicate between two Docker
containers running on two different physical machines. Docker doesn't have
much to do with my question (unless someone has a suggestion for a better
way to do what I'm trying to :o) )... each Docker container is running an
OpenSSH
34 matches
Mail list logo