[OMPI users] pt2pt osc required for single-node runs?
All, I installed Open MPI 3.1.2 on my laptop today (up from 3.0.0, which worked fine) and ran into the following error when trying to create a window: ``` -- The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release. Workarounds are to run on a single node, or to use a system with an RDMA capable network such as Infiniband. -- [beryl:13894] *** An error occurred in MPI_Win_create [beryl:13894] *** reported by process [2678849537,0] [beryl:13894] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0 [beryl:13894] *** MPI_ERR_WIN: invalid window [beryl:13894] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [beryl:13894] ***and potentially your MPI job) ``` I remember seeing this announced in the release notes. I wonder, however, why the pt2pt component is required for a run on a single node (as suggested by the error message). I tried to disable the pt2pt component, which gives a similar error but without the message about the pt2pt component: ``` $ mpirun -n 4 --mca osc ^pt2pt ./a.out [beryl:13738] *** An error occurred in MPI_Win_create [beryl:13738] *** reported by process [2621964289,0] [beryl:13738] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0 [beryl:13738] *** MPI_ERR_WIN: invalid window [beryl:13738] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [beryl:13738] ***and potentially your MPI job) ``` Is this a known issue with v3.1.2? Is there a way to get more information about what is going wrong in the second case. Is this the right way to disable the pt2pt component? Cheers, Joseph ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] pt2pt osc required for single-node runs?
You can either move to MPI_Win_allocate or try the v4.0.x snapshots. I will look at bringing the btl/vader support for osc/rdma back to v3.1.x. osc/pt2pt will probably never become truly thread safe. -Nathan On Sep 06, 2018, at 08:34 AM, Joseph Schuchart wrote: All, I installed Open MPI 3.1.2 on my laptop today (up from 3.0.0, which worked fine) and ran into the following error when trying to create a window: ``` -- The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release. Workarounds are to run on a single node, or to use a system with an RDMA capable network such as Infiniband. -- [beryl:13894] *** An error occurred in MPI_Win_create [beryl:13894] *** reported by process [2678849537,0] [beryl:13894] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0 [beryl:13894] *** MPI_ERR_WIN: invalid window [beryl:13894] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [beryl:13894] *** and potentially your MPI job) ``` I remember seeing this announced in the release notes. I wonder, however, why the pt2pt component is required for a run on a single node (as suggested by the error message). I tried to disable the pt2pt component, which gives a similar error but without the message about the pt2pt component: ``` $ mpirun -n 4 --mca osc ^pt2pt ./a.out [beryl:13738] *** An error occurred in MPI_Win_create [beryl:13738] *** reported by process [2621964289,0] [beryl:13738] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0 [beryl:13738] *** MPI_ERR_WIN: invalid window [beryl:13738] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [beryl:13738] *** and potentially your MPI job) ``` Is this a known issue with v3.1.2? Is there a way to get more information about what is going wrong in the second case. Is this the right way to disable the pt2pt component? Cheers, Joseph ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Are MPI datatypes guaranteed to be compile-time constants?
Thanks for the responses--from what you've said, it seems like MPI types are indeed not guaranteed to be compile-time constants. However, I worked with the people at IBM, and it seems like the difference in behavior was caused by the IBM compiler, not the Spectrum IBM implementation. Ben ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] RDMA over Ethernet in Open MPI - RoCE on AWS?
I'm setting up a cluster on AWS, which will have a 10Gb/s or 25Gb/s Ethernet network. Should I expect to be able to get RoCE to work in Open MPI on AWS? More generally, what optimizations and performance tuning can I do to an Open MPI installation to get good performance on an Ethernet network? My codes use a lot of random access AMOs and asynchronous block transfers, so it seems to me like setting up RDMA over Ethernet would be essential to getting good performance, but I can't seem to find much information about it online. Any pointers you have would be appreciated. Ben ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] RDMA over Ethernet in Open MPI - RoCE on AWS?
Ben, ping me off list. I know the guy who heads the HPC Solutions Architect team for AWS and an AWS Solutions Architect here in the UK. On Fri, 7 Sep 2018 at 03:11, Benjamin Brock wrote: > > I'm setting up a cluster on AWS, which will have a 10Gb/s or 25Gb/s Ethernet > network. Should I expect to be able to get RoCE to work in Open MPI on AWS? > > More generally, what optimizations and performance tuning can I do to an Open > MPI installation to get good performance on an Ethernet network? > > My codes use a lot of random access AMOs and asynchronous block transfers, so > it seems to me like setting up RDMA over Ethernet would be essential to > getting good performance, but I can't seem to find much information about it > online. > > Any pointers you have would be appreciated. > > Ben > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users