Re: [OMPI users] Order of ranks in mpirun

2019-05-17 Thread Adam Sylvester via users
Thanks - "--map-by numa:span" did exactly what I wanted! On Wed, May 15, 2019 at 10:34 PM Ralph Castain via users < users@lists.open-mpi.org> wrote: > > > > On May 15, 2019, at 7:18 PM, Adam Sylvester via users < > users@lists.open-mpi.org> wrote: > >

[OMPI users] Order of ranks in mpirun

2019-05-15 Thread Adam Sylvester via users
Up to this point, I've been running a single MPI rank per physical host (using multithreading within my application to use all available cores). I use this command: mpirun -N 1 --bind-to none --hostfile hosts.txt Where hosts.txt has an IP address on each line I've started running on machines with

Re: [OMPI users] Network performance over TCP

2019-03-23 Thread Adam Sylvester
e is a link to Brian’s video > https://insidehpc.com/2018/04/amazon-libfabric-case-study-flexible-hpc-infrastructure/ > > Cheers, > > Gilles > > On Sunday, March 24, 2019, Adam Sylvester wrote: > >> Digging up this old thread as it appears there's still an issue

Re: [OMPI users] Network performance over TCP

2019-03-23 Thread Adam Sylvester
am running with: mpirun --mca btl_tcp_links 4 -N 1 --bind-to none --hostfile hosts.txt /path/to/my/application Trying a btl_tcp_links value of 2 or 3 also makes no difference. Is there another flag I need to be using or is something still broken? Thanks. -Adam On Thu, Jul 13, 2017 at 12:05 PM

Re: [OMPI users] OpenMPI behavior with Ialltoall and GPUs

2019-03-14 Thread Adam Sylvester
). Future UCX versions will fix a lingering bug that makes this required currently. With these changes, I was able to successfully run my application. On Sun, Mar 3, 2019 at 9:49 AM Adam Sylvester wrote: > I'm running OpenMPI 4.0.0 built with gdrcopy 1.3 and UCX 1.4 per the > inst

[OMPI users] OpenMPI behavior with Ialltoall and GPUs

2019-03-03 Thread Adam Sylvester
I'm running OpenMPI 4.0.0 built with gdrcopy 1.3 and UCX 1.4 per the instructions at https://www.open-mpi.org/faq/?category=buildcuda, built against CUDA 10.0 on RHEL 7. I'm running on a p2.xlarge instance in AWS (single NVIDIA K80 GPU). OpenMPI reports CUDA support: $ ompi_info --parsable --all

Re: [OMPI users] Querying/limiting OpenMPI memory allocations

2018-12-21 Thread Adam Sylvester
ith the equivalent Send+Recv > followed by Broadcast. I don't think MPI_Allgatherv is particularly > optimized (since it is hard to do and not a very popular function) and it > might improve your memory utilization. > > > > Jeff > > > > On Thu, Dec 20, 2018 at

Re: [OMPI users] Querying/limiting OpenMPI memory allocations

2018-12-20 Thread Adam Sylvester
ample if you run > MPI_Scatter(root=0) in a loop) > > Cheers, > > Gilles > > On Thu, Dec 20, 2018 at 11:06 PM Adam Sylvester wrote: > > > > This case is actually quite small - 10 physical machines with 18 > physical cores each, 1 rank per machine. These are AWS R

Re: [OMPI users] Querying/limiting OpenMPI memory allocations

2018-12-20 Thread Adam Sylvester
ote: > How many nodes are you using? How many processes per node? What kind of > processor? Open MPI version? 25 GB is several orders of magnitude more > memory than should be used except at extreme scale (1M+ processes). Also, > how are you calculating memory usage? > > -Nath

[OMPI users] Querying/limiting OpenMPI memory allocations

2018-12-20 Thread Adam Sylvester
Is there a way at runtime to query OpenMPI to ask it how much memory it's using for internal buffers? Is there a way at runtime to set a max amount of memory OpenMPI will use for these buffers? I have an application where for certain inputs OpenMPI appears to be allocating ~25 GB and I'm not acco

[OMPI users] Limit to number of asynchronous sends/receives?

2018-12-16 Thread Adam Sylvester
I'm running OpenMPI 2.1.0 on RHEL 7 using TCP communication. For the specific run that's crashing on me, I'm running with 17 ranks (on 17 different physical machines). I've got a stage in my application where ranks need to transfer chunks of data where the size of each chunk is trivial (on the or

Re: [OMPI users] Locking down TCP ports used

2018-07-07 Thread Adam Sylvester
ther with the port min, define a range of > ports > where Open MPI will open sockets > > btl_tcp_port_min_v4: starting port to use > > I can’t answer the question about #ports to open - will have to leave that > to someone else > Ralph > > > On Jul 7, 2

[OMPI users] Locking down TCP ports used

2018-07-07 Thread Adam Sylvester
I'm using OpenMPI 2.1.0 on RHEL 7, communicating between ranks via TCP I have a new cluster to install my application on with tightly-controlled firewalls. I can have them open up a range of TCP ports which MPI can communicate over. I thought I could force MPI to stick to a range of ports via "-

Re: [OMPI users] mpirun issue using more than 64 hosts

2018-02-12 Thread Adam Sylvester
rwise mpirun would fork&exec a large > number of ssh processes and hence use quite a lot of > resources on the node running mpirun. > > Cheers, > > Gilles > > On Tue, Feb 13, 2018 at 8:23 AM, Adam Sylvester wrote: > > I'm running OpenMPI 2.1.0, built from so

[OMPI users] mpirun issue using more than 64 hosts

2018-02-12 Thread Adam Sylvester
I'm running OpenMPI 2.1.0, built from source, on RHEL 7. I'm using the default ssh-based launcher, where I have my private ssh key on rank 0 and the associated public key on all ranks. I create a hosts file with a list of unique IPs, with the host that I'm running mpirun from on the first line, a

[OMPI users] Tracking Open MPI memory usage

2017-11-26 Thread Adam Sylvester
I have an application running across 20 machines where each machine has 60 GB RAM. For some large inputs, some ranks require 45-50 GB RAM. The behavior I'm seeing is that for some of these large cases, my application will run for 10-15 minutes and then one rank will be killed; based on watching t

Re: [OMPI users] Forcing MPI processes to end

2017-11-17 Thread Adam Sylvester
Thanks - that's exactly what I needed! Works as advertised. :o) On Thu, Nov 16, 2017 at 1:27 PM, Aurelien Bouteiller wrote: > Adam. Your MPI program is incorrect. You need to replace the finalize on > the process that found the error with MPIAbort > > On Nov 16, 2017 10:38

[OMPI users] Forcing MPI processes to end

2017-11-16 Thread Adam Sylvester
I'm using Open MPI 2.1.0 for this but I'm not sure if this is more of an Open MPI-specific implementation question or what the MPI standard guarantees. I have an application which runs across multiple ranks, eventually reaching an MPI_Gather() call. Along the way, if one of the ranks encounters a

[OMPI users] NUMA interaction with Open MPI

2017-07-16 Thread Adam Sylvester
I'll start with my question upfront: Is there a way to do the equivalent of telling mpirun to do 'numactl --interleave=all' on the processes that it runs? Or if I want to control the memory placement of my applications run through MPI will I need to use libnuma for this? I tried doing "mpirun nu

Re: [OMPI users] Network performance over TCP

2017-07-13 Thread Adam Sylvester
t applications (outside of benchmarks) don’t > benefit from the 20 Gbps between rank pairs, as they are generally talking > to multiple peers at once (and therefore can drive the full 20 Gbps). It’s > definitely on our roadmap, but can’t promise a release just yet. > > Brian > > On

Re: [OMPI users] Network performance over TCP

2017-07-12 Thread Adam Sylvester
environment >> >> export OMPI_MCA_btl_tcp_sndbuf=0 >> >> export OMPI_MCA_btl_tcp_rcvbuf=0 >> >> >> - use Open MPI 2.0.3 >> >> >> - last but not least, you can manually download and apply the patch >> available at >> >>

Re: [OMPI users] Network performance over TCP

2017-07-11 Thread Adam Sylvester
ailable at > > https://github.com/open-mpi/ompi/commit/b64fedf4f652cadc9bfc > 7c4693f9c1ef01dfb69f.patch > > > Cheers, > > Gilles > > On 7/9/2017 11:04 PM, Adam Sylvester wrote: > >> Gilles, >> >> Thanks for the fast response! >> >> The

Re: [OMPI users] Network performance over TCP

2017-07-09 Thread Adam Sylvester
ne socket on the fast interface. > for example, if you want to use 4 sockets per interface > mpirun --mca btl_tcp_links 4 ... > > > > Cheers, > > Gilles > > On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester wrote: > > I am using Open MPI 2.1.0 on RHEL 7. My app

[OMPI users] Network performance over TCP

2017-07-09 Thread Adam Sylvester
I am using Open MPI 2.1.0 on RHEL 7. My application has one unavoidable pinch point where a large amount of data needs to be transferred (about 8 GB of data needs to be both sent to and received all other ranks), and I'm seeing worse performance than I would expect; this step has a major impact on

Re: [OMPI users] How to launch ompi-server?

2017-05-28 Thread Adam Sylvester
planned for release in the near future > > On Mar 19, 2017, at 1:40 PM, Adam Sylvester wrote: > > I did a little more testing in case this helps... if I run ompi-server on > the same host as the one I call MPI_Publish_name() on, it does successfully > connect. But when I run it on a

Re: [OMPI users] MPI_Comm_accept()

2017-05-28 Thread Adam Sylvester
in the near future > > On Mar 14, 2017, at 6:26 PM, Adam Sylvester wrote: > > Excellent - I appreciate the quick turnaround. > > On Tue, Mar 14, 2017 at 10:24 AM, r...@open-mpi.org > wrote: > >> I don’t see an issue right away, though I know it has been brought up &g

Re: [OMPI users] How to launch ompi-server?

2017-03-19 Thread Adam Sylvester
> > On Mar 19, 2017, at 4:37 AM, Adam Sylvester wrote: > > I am trying to use ompi-server with Open MPI 1.10.6. I'm wondering if I > should run this with or without the mpirun command. If I run this: > > ompi-server --no-daemonize -r + > > It prints something such

[OMPI users] How to launch ompi-server?

2017-03-19 Thread Adam Sylvester
I am trying to use ompi-server with Open MPI 1.10.6. I'm wondering if I should run this with or without the mpirun command. If I run this: ompi-server --no-daemonize -r + It prints something such as 959315968.0;tcp://172.31.3.57:45743 to stdout but I have thus far been unable to connect to it.

Re: [OMPI users] MPI_Comm_accept()

2017-03-14 Thread Adam Sylvester
hen ready. > > > On Mar 13, 2017, at 6:16 PM, Adam Sylvester wrote: > > Bummer - thanks for the update. I will revert back to 1.10.x for now > then. Should I file a bug report for this on GitHub or elsewhere? Or if > there's an issue for this already open, can you point me

Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread Adam Sylvester
On Mar 13, 2017, at 5:17 AM, Adam Sylvester wrote: > > As a follow-up, I tried this with Open MPI 1.10.4 and this worked as > expected (the port formatting looks really different): > > $ mpirun -np 1 ./server > Port name is 1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://

Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread Adam Sylvester
Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester wrote: > I'm using Open MPI 2.0.2 on RHEL 7. I'm trying to use MPI_Open_port() / > MPI_Comm_accept() / MPI_Conn_connect(). My use case is that I'll have two > processes running on two machines that don't initially know about

[OMPI users] MPI_Comm_accept()

2017-03-12 Thread Adam Sylvester
I'm using Open MPI 2.0.2 on RHEL 7. I'm trying to use MPI_Open_port() / MPI_Comm_accept() / MPI_Conn_connect(). My use case is that I'll have two processes running on two machines that don't initially know about each other (i.e. I can't do the typical mpirun with a list of IPs); eventually I thin

Re: [OMPI users] mpirun with ssh tunneling

2017-01-01 Thread Adam Sylvester
> and port 22 (ssh). > > you can also refer to https://github.com/open-mpi/ompi/issues/1511 > yet an other way to use docker was discussed here. > > last but not least, if you want to use containers but you are not tied to > docker, you can consider http://singularity.l

[OMPI users] mpirun with ssh tunneling

2016-12-25 Thread Adam Sylvester
I'm trying to use OpenMPI 1.10.4 to communicate between two Docker containers running on two different physical machines. Docker doesn't have much to do with my question (unless someone has a suggestion for a better way to do what I'm trying to :o) )... each Docker container is running an OpenSSH