Re: [OMPI users] Docker Cluster Queue Manager
Did you check shifter? https://www.nersc.gov/assets/Uploads/cug2015udi.pdf , https://www.nersc.gov/assets/Uploads/cug2015udi.pdf , http://www.nersc.gov/research-and-development/user-defined-images/ , https://github.com/NERSC/shifter On 06/03/2016 01:58 AM, Rob Nagler wrote: We would like to use MPI on Docker with arbitrarily configured clusters (e.g. created with StarCluster or bare metal). What I'm curious about is if there is a queue manager that understands Docker, file systems, MPI, and OpenAuth. JupyterHub does a lot of this, but it doesn't interface with MPI. Ideally, we'd like users to be able to queue up jobs directly from JupyterHub. Currently, we can configure and initiate an MPI-compatible Docker cluster running on a VPC using Salt. What's missing is the ability to manage a queue of these clusters. Here's a list of requirements: JupyterHub users do not have Unix user ids Containers must be started as a non-root guest user (--user) JupyterHub user's data directory is mounted in container Data is shared via NFS or other cluster file system sshd runs in container for MPI as guest user Results have to be reported back to GitHub user MPI network must be visible (--net=host) Queue manager must be compatible with the above JupyterHub user is not allowed to interact with Docker directly Docker images are user selectable (from an approved list) Jupyter and MPI containers started from same image Know of a system which supports this? Our code and config are open source, and your feedback would be greatly appreciated. Salt configuration: https://github.com/radiasoft/salt-conf Container builders: https://github.com/radiasoft/containers/tree/master/radiasoft Early phase wiki: https://github.com/radiasoft/devops/wiki/DockerMPI Thanks, Rob ___ users mailing list us...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/06/29355.php
Re: [OMPI users] Docker Cluster Queue Manager
That's why they have acl in ZoL, no? just bring up a new filesystem for each container, with acl so only the owning container can use that fs, and you should be done, no? To be clear, each container would have to have a unique uid for this to work, but together with Ralph's idea of a uid pool this would provide good isolation. The reason for ZoL filesystems is to ensure isolation as well as the other benefits of zfs to docker... Anyway, clusterhq seem to have a nice product called flocker, which might also be relevant for this. On 06/06/2016 12:07 PM, John Hearns wrote: Rob, I am not familair with wakari.io However what you say about the Unix userid problem is very relevant to many 'shared infrastructure' projects and is a topic which comes up in discussions about them. Teh concern there is, as you say, if the managers of the system have a global filesystem, with shared datasets, then if virtual clusters are created on the shared infrastructure, or if containers are used, then if the user have root access they can have privileges over the global filesystem. You are making some very relevant points here. On 5 June 2016 at 01:51, Rob Naglerwrote: Thanks! SLURM Elastic Computing seems like it might do the trick. I need to try it out. xCAT is interesting, too. It seems to be the HPC version of Salt'ed Cobbler. :) I don't know that it's so important for our problem. We have a small cluster for testing against the cloud, primarily. I could see xCAT being quite powerful for large clusters. I'm not sure how to explain the Unix user id problem other than a gmail account does not have a corresponding Unix user id. Nor do you have one for your representation on this mailing list. That decoupling is important. The actual execution of unix processes on behalf of users of gmail, this mailing list, etc. run as a Unix single user. That's how JupyterHub containers run. When you click "Start Server" in JupyterHub, it starts a docker container as some system user (uid=1000 in our case), and the container is given access to the user's files via a Docker volume. The container cannot see any other user's files. In a typical HPC context, the files are all in /home/. The "containment" is done by normal Unix file permissions. It's very easy, but it doesn't work for web apps as described above. Even being able to list all the other users on a system (via "ls /home") is a privacy breach in a web app. Rob ___ users mailing list us...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/06/29369.php ___ users mailing list us...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/06/29377.php
Re: [OMPI users] Docker Cluster Queue Manager
On 06/06/2016 06:32 PM, Rob Nagler wrote: Thanks, John. I sometimes wonder if I'm the only one out there with this particular problem. Ralph, thanks for sticking with me. :) Using a pool of uids doesn't really work due to the way cgroups/containers works. It also would require changing the permissions of all of the user's files, which would create issues for Jupyter/Hub's access to the files, which is used for in situ monitoring. Docker does not yet handle uid mapping at the container level (1.10 added mappings for the daemon). We have solved this problem by adding a uid/gid switcher at container startup for our images. The trick is to change the uid/gid of the "container user" with usermod and groupmod. This only works, however, with images we provide. I'd like a solution that allows us to start arbitrary/unsafe images, relying on cgroups to their job. Gilles, the containers do lock the user down, but the problem is that the file system space has to be dynamically bound to the containers across the cluster. JuptyerHub solves this problem by understanding the concept of a user, and providing a hook to change the directory to be mounted. Daniel, we've had bad experiences with ZoL. It's allocation algorithm degrades rapidly when the file system gets over 80% full. It still is not integrated into major distros, which leads to dkms nightmares on system upgrades. I don't really see Flocker as helping in this regard, because the problem is the scheduler, not the file system. We know which directory we have to mount from the cluster file system, just need to get the scheduler to allow us to mount that with the container that is running slurmd. Any storage with high percentage usage will degrade performance. ZoL is actually nicer than btrfs in that regard, but xfs does handle low free space better most of the time. If you have the memory to spare, and the images are mostly identical, deduplication (or even better - compression) can help in that regard. Regarding integration - that's mostly licensing issues, and not a reflection of the maturity of the technology itself. Regarding dkms - use kabi-tracking-kmod Just my 2 cents. I'll play with Slurm Elastic Compute this week to see how it works. Rob ___ users mailing list us...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/06/29382.php
[OMPI users] simple mpi hello world segfaults when coll ml not disabled
given a simple hello.c: #include #include int main(int argc, char* argv[]) { int size, rank, len; char name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(name, &len); printf("%s: Process %d out of %d\n", name, rank, size); MPI_Finalize(); } for n=1 mpirun -n 1 ./hello it works correctly. for n>1 it segfaults with signal 11 used gdb to trace the problem to ml coll: Program received signal SIGSEGV, Segmentation fault. 0x76750845 in ml_coll_hier_barrier_setup() from /lib/openmpi/mca_coll_ml.so running with mpirun -n 2 --mca coll ^ml ./hello works correctly using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at all relevant. openmpi 1.8.5 was built with following options: rpmbuild --rebuild --define 'configure_options --with-verbs=/usr --with-verbs-libdir=/usr/lib64 CC=gcc CXX=g++ FC=gfortran CFLAGS="-g -O3" --enable-mpirun-prefix-by-default --enable-orterun-prefix-by-default --disable-debug --with-knem=/opt/knem-1.1.1.90mlnx --with-platform=optimized --without-mpi-param-check --with-contrib-vt-flags=--disable-iotrace --enable-builtin-atomics --enable-cxx-exceptions --enable-sparse-groups --enable-mpi-thread-multiple --enable-memchecker --enable-btl-openib-failover --with-hwloc=internal --with-verbs --with-x --with-slurm --with-pmi=/opt/slurm --with-fca=/opt/mellanox/fca --with-mxm=/opt/mellanox/mxm --with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm gcc version 5.1.1 Thanks in advance
Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled
No, that's the issue. I had to disable it to get things working. That's why I included my config settings - I couldn't figure out which option enabled it, so I could remove it from the configuration... On 06/18/2015 02:43 PM, Gilles Gouaillardet wrote: Daniel, ML module is not ready for production and is disabled by default. Did you explicitly enable this module ? If yes, I encourage you to disable it Cheers, Gilles On Thursday, June 18, 2015, Daniel Letai <mailto:d...@letai.org.il>> wrote: given a simple hello.c: #include #include int main(int argc, char* argv[]) { int size, rank, len; char name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(name, &len); printf("%s: Process %d out of %d\n", name, rank, size); MPI_Finalize(); } for n=1 mpirun -n 1 ./hello it works correctly. for n>1 it segfaults with signal 11 used gdb to trace the problem to ml coll: Program received signal SIGSEGV, Segmentation fault. 0x76750845 in ml_coll_hier_barrier_setup() from /lib/openmpi/mca_coll_ml.so running with mpirun -n 2 --mca coll ^ml ./hello works correctly using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at all relevant. openmpi 1.8.5 was built with following options: rpmbuild --rebuild --define 'configure_options --with-verbs=/usr --with-verbs-libdir=/usr/lib64 CC=gcc CXX=g++ FC=gfortran CFLAGS="-g -O3" --enable-mpirun-prefix-by-default --enable-orterun-prefix-by-default --disable-debug --with-knem=/opt/knem-1.1.1.90mlnx --with-platform=optimized --without-mpi-param-check --with-contrib-vt-flags=--disable-iotrace --enable-builtin-atomics --enable-cxx-exceptions --enable-sparse-groups --enable-mpi-thread-multiple --enable-memchecker --enable-btl-openib-failover --with-hwloc=internal --with-verbs --with-x --with-slurm --with-pmi=/opt/slurm --with-fca=/opt/mellanox/fca --with-mxm=/opt/mellanox/mxm --with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm gcc version 5.1.1 Thanks in advance ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27154.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27155.php
Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled
Thanks, will try it on Sunday (won't have access to the system till then) On 06/18/2015 04:36 PM, Gilles Gouaillardet wrote: This is really odd... you can run ompi_info --all and search coll_ml_priority it will display the current value and the origin (e.g. default, system wide config, user config, cli, environment variable) Cheers, Gilles On Thursday, June 18, 2015, Daniel Letai <mailto:d...@letai.org.il>> wrote: No, that's the issue. I had to disable it to get things working. That's why I included my config settings - I couldn't figure out which option enabled it, so I could remove it from the configuration... On 06/18/2015 02:43 PM, Gilles Gouaillardet wrote: Daniel, ML module is not ready for production and is disabled by default. Did you explicitly enable this module ? If yes, I encourage you to disable it Cheers, Gilles On Thursday, June 18, 2015, Daniel Letai > wrote: given a simple hello.c: #include #include int main(int argc, char* argv[]) { int size, rank, len; char name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(name, &len); printf("%s: Process %d out of %d\n", name, rank, size); MPI_Finalize(); } for n=1 mpirun -n 1 ./hello it works correctly. for n>1 it segfaults with signal 11 used gdb to trace the problem to ml coll: Program received signal SIGSEGV, Segmentation fault. 0x76750845 in ml_coll_hier_barrier_setup() from /lib/openmpi/mca_coll_ml.so running with mpirun -n 2 --mca coll ^ml ./hello works correctly using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at all relevant. openmpi 1.8.5 was built with following options: rpmbuild --rebuild --define 'configure_options --with-verbs=/usr --with-verbs-libdir=/usr/lib64 CC=gcc CXX=g++ FC=gfortran CFLAGS="-g -O3" --enable-mpirun-prefix-by-default --enable-orterun-prefix-by-default --disable-debug --with-knem=/opt/knem-1.1.1.90mlnx --with-platform=optimized --without-mpi-param-check --with-contrib-vt-flags=--disable-iotrace --enable-builtin-atomics --enable-cxx-exceptions --enable-sparse-groups --enable-mpi-thread-multiple --enable-memchecker --enable-btl-openib-failover --with-hwloc=internal --with-verbs --with-x --with-slurm --with-pmi=/opt/slurm --with-fca=/opt/mellanox/fca --with-mxm=/opt/mellanox/mxm --with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm gcc version 5.1.1 Thanks in advance ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27154.php ___ users mailing list us...@open-mpi.org Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post:http://www.open-mpi.org/community/lists/users/2015/06/27155.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27157.php
Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled
MCA coll: parameter "coll_ml_priority" (current value: "0", data source: default, level: 9 dev/all, type: int) Not sure how to read this, but for any n>1 mpirun only works with --mca coll ^ml Thanks for helping On 06/18/2015 04:36 PM, Gilles Gouaillardet wrote: This is really odd... you can run ompi_info --all and search coll_ml_priority it will display the current value and the origin (e.g. default, system wide config, user config, cli, environment variable) Cheers, Gilles On Thursday, June 18, 2015, Daniel Letai <mailto:d...@letai.org.il>> wrote: No, that's the issue. I had to disable it to get things working. That's why I included my config settings - I couldn't figure out which option enabled it, so I could remove it from the configuration... On 06/18/2015 02:43 PM, Gilles Gouaillardet wrote: Daniel, ML module is not ready for production and is disabled by default. Did you explicitly enable this module ? If yes, I encourage you to disable it Cheers, Gilles On Thursday, June 18, 2015, Daniel Letai > wrote: given a simple hello.c: #include #include int main(int argc, char* argv[]) { int size, rank, len; char name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(name, &len); printf("%s: Process %d out of %d\n", name, rank, size); MPI_Finalize(); } for n=1 mpirun -n 1 ./hello it works correctly. for n>1 it segfaults with signal 11 used gdb to trace the problem to ml coll: Program received signal SIGSEGV, Segmentation fault. 0x76750845 in ml_coll_hier_barrier_setup() from /lib/openmpi/mca_coll_ml.so running with mpirun -n 2 --mca coll ^ml ./hello works correctly using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at all relevant. openmpi 1.8.5 was built with following options: rpmbuild --rebuild --define 'configure_options --with-verbs=/usr --with-verbs-libdir=/usr/lib64 CC=gcc CXX=g++ FC=gfortran CFLAGS="-g -O3" --enable-mpirun-prefix-by-default --enable-orterun-prefix-by-default --disable-debug --with-knem=/opt/knem-1.1.1.90mlnx --with-platform=optimized --without-mpi-param-check --with-contrib-vt-flags=--disable-iotrace --enable-builtin-atomics --enable-cxx-exceptions --enable-sparse-groups --enable-mpi-thread-multiple --enable-memchecker --enable-btl-openib-failover --with-hwloc=internal --with-verbs --with-x --with-slurm --with-pmi=/opt/slurm --with-fca=/opt/mellanox/fca --with-mxm=/opt/mellanox/mxm --with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm gcc version 5.1.1 Thanks in advance ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27154.php ___ users mailing list us...@open-mpi.org Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post:http://www.open-mpi.org/community/lists/users/2015/06/27155.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27157.php
Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled
Gilles, Attached the two output logs. Thanks, Daniel On 06/22/2015 08:08 AM, Gilles Gouaillardet wrote: Daniel, i double checked this and i cannot make any sense with these logs. if coll_ml_priority is zero, then i do not any way how ml_coll_hier_barrier_setup can be invoked. could you please run again with --mca coll_base_verbose 100 with and without --mca coll ^ml Cheers, Gilles On 6/22/2015 12:08 AM, Gilles Gouaillardet wrote: Daniel, ok, thanks it seems that even if priority is zero, some code gets executed I will confirm this tomorrow and send you a patch to work around the issue if that if my guess is proven right Cheers, Gilles On Sunday, June 21, 2015, Daniel Letai <mailto:d...@letai.org.il>> wrote: MCA coll: parameter "coll_ml_priority" (current value: "0", data source: default, level: 9 dev/all, type: int) Not sure how to read this, but for any n>1 mpirun only works with --mca coll ^ml Thanks for helping On 06/18/2015 04:36 PM, Gilles Gouaillardet wrote: This is really odd... you can run ompi_info --all and search coll_ml_priority it will display the current value and the origin (e.g. default, system wide config, user config, cli, environment variable) Cheers, Gilles On Thursday, June 18, 2015, Daniel Letai > wrote: No, that's the issue. I had to disable it to get things working. That's why I included my config settings - I couldn't figure out which option enabled it, so I could remove it from the configuration... On 06/18/2015 02:43 PM, Gilles Gouaillardet wrote: Daniel, ML module is not ready for production and is disabled by default. Did you explicitly enable this module ? If yes, I encourage you to disable it Cheers, Gilles On Thursday, June 18, 2015, Daniel Letai wrote: given a simple hello.c: #include #include int main(int argc, char* argv[]) { int size, rank, len; char name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(name, &len); printf("%s: Process %d out of %d\n", name, rank, size); MPI_Finalize(); } for n=1 mpirun -n 1 ./hello it works correctly. for n>1 it segfaults with signal 11 used gdb to trace the problem to ml coll: Program received signal SIGSEGV, Segmentation fault. 0x76750845 in ml_coll_hier_barrier_setup() from /lib/openmpi/mca_coll_ml.so running with mpirun -n 2 --mca coll ^ml ./hello works correctly using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at all relevant. openmpi 1.8.5 was built with following options: rpmbuild --rebuild --define 'configure_options --with-verbs=/usr --with-verbs-libdir=/usr/lib64 CC=gcc CXX=g++ FC=gfortran CFLAGS="-g -O3" --enable-mpirun-prefix-by-default --enable-orterun-prefix-by-default --disable-debug --with-knem=/opt/knem-1.1.1.90mlnx --with-platform=optimized --without-mpi-param-check --with-contrib-vt-flags=--disable-iotrace --enable-builtin-atomics --enable-cxx-exceptions --enable-sparse-groups --enable-mpi-thread-multiple --enable-memchecker --enable-btl-openib-failover --with-hwloc=internal --with-verbs --with-x --with-slurm --with-pmi=/opt/slurm --with-fca=/opt/mellanox/fca --with-mxm=/opt/mellanox/mxm --with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm gcc version 5.1.1 Thanks in advance ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/06/27154.php ___ users mailing list us...@open-mpi.org Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post:http://www.open-mpi.org/community/lists/users/2015/06/27155.php ___ users mailing list us...@open-mpi.org Subscription:http://www.open-mpi.org/mailm
[OMPI users] display-map option in v1.8.8
Hi, After upgrading to 1.8.8 I can no longer see the map. When looking at the man page for mpirun, display-map no longer exists. Is there a way to show the map in 1.8.8 ? Another issue - I'd like to map 2 process per node - 1 to each socket. What is the current "correct" syntax? --map-by ppr:2:node doesn't guarantee 1 per Socket. --map-by ppr:1:socket doesn't guarantee 2 per node. I assume it's something obvious, but the documentation is somewhat lacking. I'd like to know the general syntax - even if I have 4 socket nodes I'd still like to map only 2 procs per node. Combining with numa/dist to hca/dist to gpu will be very helpful too. Thanks,
Re: [OMPI users] display-map option in v1.8.8
Thanks for the reply, On 10/13/2015 04:04 PM, Ralph Castain wrote: On Oct 12, 2015, at 6:10 AM, Daniel Letai wrote: Hi, After upgrading to 1.8.8 I can no longer see the map. When looking at the man page for mpirun, display-map no longer exists. Is there a way to show the map in 1.8.8 ? I don’t know why/how it got dropped from the man page, but the display-map option certainly still exists - do “mpirun -h” to see the full list of options, and you’ll see it is there. I’ll ensure it gets restored to the man page in the 1.10 series as the 1.8 series is complete. Thanks for clarifying, Another issue - I'd like to map 2 process per node - 1 to each socket. What is the current "correct" syntax? --map-by ppr:2:node doesn't guarantee 1 per Socket. --map-by ppr:1:socket doesn't guarantee 2 per node. I assume it's something obvious, but the documentation is somewhat lacking. I'd like to know the general syntax - even if I have 4 socket nodes I'd still like to map only 2 procs per node. That’s a tough one. I’m not sure there is a way to do that right now. Probably something we’d have to add. Out of curiosity, if you have 4 sockets and only 2 procs, would you want each proc bound to 2 of the 4 sockets? Or are you expecting them to be bound to only 1 socket (thus leaving 2 sockets idle), or simply leave them unbound? I have 2 pci devices (gpu) per node. I need 1 proc per socket to be bound to that socket and "talk" to it's respective gpu, so no matter how many sockets I have - I must distribute the procs 2 per node, each in it's own socket (actually, each in it's own numa) and be bound. So I expect them to be "bound to only 1 socket (thus leaving 2 sockets idle)". I might run other jobs on the idle sockets (depending on mem utilization) but that's not an immediate concern at this time. Combining with numa/dist to hca/dist to gpu will be very helpful too. Definitely no way to do this one today. Thanks, ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/10/27860.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/10/27861.php
Re: [OMPI users] display-map option in v1.8.8
On 10/20/2015 04:14 PM, Ralph Castain wrote: On Oct 20, 2015, at 5:47 AM, Daniel Letai <mailto:d...@letai.org.il>> wrote: Thanks for the reply, On 10/13/2015 04:04 PM, Ralph Castain wrote: On Oct 12, 2015, at 6:10 AM, Daniel Letai <mailto:d...@letai.org.il>> wrote: Hi, After upgrading to 1.8.8 I can no longer see the map. When looking at the man page for mpirun, display-map no longer exists. Is there a way to show the map in 1.8.8 ? I don’t know why/how it got dropped from the man page, but the display-map option certainly still exists - do “mpirun -h” to see the full list of options, and you’ll see it is there. I’ll ensure it gets restored to the man page in the 1.10 series as the 1.8 series is complete. Thanks for clarifying, Another issue - I'd like to map 2 process per node - 1 to each socket. What is the current "correct" syntax? --map-by ppr:2:node doesn't guarantee 1 per Socket. --map-by ppr:1:socket doesn't guarantee 2 per node. I assume it's something obvious, but the documentation is somewhat lacking. I'd like to know the general syntax - even if I have 4 socket nodes I'd still like to map only 2 procs per node. That’s a tough one. I’m not sure there is a way to do that right now. Probably something we’d have to add. Out of curiosity, if you have 4 sockets and only 2 procs, would you want each proc bound to 2 of the 4 sockets? Or are you expecting them to be bound to only 1 socket (thus leaving 2 sockets idle), or simply leave them unbound? I have 2 pci devices (gpu) per node. I need 1 proc per socket to be bound to that socket and "talk" to it's respective gpu, so no matter how many sockets I have - I must distribute the procs 2 per node, each in it's own socket (actually, each in it's own numa) and be bound. So I expect them to be "bound to only 1 socket (thus leaving 2 sockets idle)”. Are the gpu’s always near the same sockets for every node? If so, you might be able to use the cpu-set option to restrict us to those sockets, and then just "—map-by ppr:2:node —bind-to socket" -cpu-set|--cpu-set Comma-separated list of ranges specifying logical cpus allocated to this job [default: none] I Believe this should solve the issue. So the cmdline should be something like: mpirun --map-by ppr:2:node --bind-to socket --cpu-set 0,2 BTW --cpu-set also absent from man page. I might run other jobs on the idle sockets (depending on mem utilization) but that's not an immediate concern at this time. Combining with numa/dist to hca/dist to gpu will be very helpful too. Definitely no way to do this one today. Thanks, ___ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/10/27860.php ___ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/10/27861.php ___ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/10/27898.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/10/27899.php
Re: [OMPI users] Building PMIx and Slurm support
Hello, I have built the following stack : centos 7.5 (gcc 4.8.5-28, libevent 2.0.21-4) MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tgz built with --all --without-32bit (this includes ucx 1.5.0) hwloc from centos 7.5 : 1.11.8-4.el7 pmix 3.1.2 slurm 18.08.5-2 built --with-ucx --with-pmix openmpi 4.0.0 : configure --with-slurm --with-pmix=external --with-pmi --with-libevent=external --with-hwloc=external --with-knem=/opt/knem-1.1.3.90mlnx1 --with-hcoll=/opt/mellanox/hcoll The configure part succeeds, however 'make' errors out with: ext3x.c: In function 'ext3x_value_unload': ext3x.c:1109:10: error: 'PMIX_MODEX' undeclared (first use in this function) And same for 'PMIX_INFO_ARRAY' However, both are declared in the opal/mca/pmix/pmix3x/pmix/include/pmix_common.h file. opal/mca/pmix/ext3x/ext3x.c does include pmix_common.h but as a system include #include , while ext3x.h includes it as a local include #include "pmix_common". Neither seem to pull from the correct path. Regards, Dani_L. On 2/24/19 3:09 AM, Gilles Gouaillardet wrote: Passant, you have to manually download and apply https://github.com/pmix/pmix/commit/2e2f4445b45eac5a3fcbd409c81efe318876e659.patch to PMIx 2.2.1 that should likely fix your problem. As a side note, it is a bad practice to configure --with-FOO=/usr since it might have some unexpected side effects. Instead, you can replace configure --with-slurm --with-pmix=/usr --with-pmi=/usr --with-libevent=/usr with configure --with-slurm --with-pmix=external --with-pmi --with-libevent=external to be on the safe side I also invite you to pass --with-hwloc=external to the configure command line Cheers, Gilles On Sun, Feb 24, 2019 at 1:54 AM Passant A. Hafez wrote: Hello Gilles, Here are some details: Slurm 18.08.4 PMIx 2.2.1 (as shown in /usr/include/pmix_version.h) Libevent 2.0.21 srun --mpi=list srun: MPI types are... srun: none srun: openmpi srun: pmi2 srun: pmix srun: pmix_v2 Open MPI versions tested: 4.0.0 and 3.1.2 For each installation to be mentioned a different MPI Hello World program was compiled. Jobs were submitted by sbatch, 2 node * 2 tasks per node then srun --mpi=pmix program File 400ext_2x2.out (attached) is for OMPI 4.0.0 installation with configure options: --with-slurm --with-pmix=/usr --with-pmi=/usr --with-libevent=/usr and configure log: Libevent support: external PMIx support: External (2x) File 400int_2x2.out (attached) is for OMPI 4.0.0 installation with configure options: --with-slurm --with-pmix and configure log: Libevent support: internal (external libevent version is less that internal version 2.0.22) PMIx support: Internal Tested also different installations for 3.1.2 and got errors similar to 400ext_2x2.out (NOT-SUPPORTED in file event/pmix_event_registration.c at line 101) All the best, -- Passant A. Hafez | HPC Applications Specialist KAUST Supercomputing Core Laboratory (KSL) King Abdullah University of Science and Technology Building 1, Al-Khawarizmi, Room 0123 Mobile : +966 (0) 55-247-9568 Mobile : +20 (0) 106-146-9644 Office : +966 (0) 12-808-0367 From: users on behalf of Gilles Gouaillardet Sent: Saturday, February 23, 2019 5:17 PM To: Open MPI Users Subject: Re: [OMPI users] Building PMIx and Slurm support Hi, PMIx has cross-version compatibility, so as long as the PMIx library used by SLURM is compatible with the one (internal or external) used by Open MPI, you should be fine. If you want to minimize the risk of cross-version incompatibility, then I encourage you to use the same (and hence external) PMIx that was used to build SLURM with Open MPI. Can you tell a bit more than "it didn't work" ? (Open MPI version, PMIx version used by SLURM, PMIx version used by Open MPI, error message, ...) Cheers, Gilles On Sat, Feb 23, 2019 at 9:46 PM Passant A. Hafez wrote: Good day everyone, I've trying to build and use the PMIx support for Open MPI but I tried many things that I can list if needed, but with no luck. I was able to test the PMIx client but when I used OMPI specifying srun --mpi=pmix it didn't work. So if you please advise me with the versions of each PMIx and Open MPI that should be working well with Slurm 18.08, it'd be great. Also, what is the difference between using internal vs external PMIx installations? All the best, -- Passant A. Hafez | HPC Applications Specialist KAUST Supercomputing Core Laboratory (KSL) King Abdullah University of Science and Technology Building 1, Al-Khawarizmi, Room 0123 Mobile : +966 (0) 55-247-9568 Mobile : +20 (0) 106-146-9644 Office : +966 (0) 12-808-0367 ___ users mailing list users@lists.open-mpi.or
Re: [OMPI users] Building PMIx and Slurm support
Sent from my iPhone > On 3 Mar 2019, at 16:31, Gilles Gouaillardet > wrote: > > Daniel, > > PMIX_MODEX and PMIX_INFO_ARRAY have been removed from PMIx 3.1.2, and > Open MPI 4.0.0 was not ready for this. > > You can either use the internal PMIx (3.0.2), or try 4.0.1rc1 (with > the external PMIx 3.1.2) that was published a few days ago. > Thanks, will try that tomorrow. I can’t use internal due to Slurm dependency, but I will try the rc. Any idea when 4.0.1 will be released? > FWIW, you are right using --with-pmix=external (and not using > --with-pmix=/usr) > > Cheers, > > Gilles > >> On Sun, Mar 3, 2019 at 10:57 PM Daniel Letai wrote: >> >> Hello, >> >> >> I have built the following stack : >> >> centos 7.5 (gcc 4.8.5-28, libevent 2.0.21-4) >> MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tgz built with --all >> --without-32bit (this includes ucx 1.5.0) >> hwloc from centos 7.5 : 1.11.8-4.el7 >> pmix 3.1.2 >> slurm 18.08.5-2 built --with-ucx --with-pmix >> openmpi 4.0.0 : configure --with-slurm --with-pmix=external --with-pmi >> --with-libevent=external --with-hwloc=external >> --with-knem=/opt/knem-1.1.3.90mlnx1 --with-hcoll=/opt/mellanox/hcoll >> >> The configure part succeeds, however 'make' errors out with: >> >> ext3x.c: In function 'ext3x_value_unload': >> >> ext3x.c:1109:10: error: 'PMIX_MODEX' undeclared (first use in this function) >> >> >> And same for 'PMIX_INFO_ARRAY' >> >> >> However, both are declared in the >> opal/mca/pmix/pmix3x/pmix/include/pmix_common.h file. >> >> opal/mca/pmix/ext3x/ext3x.c does include pmix_common.h but as a system >> include #include , while ext3x.h includes it as a local >> include #include "pmix_common". Neither seem to pull from the correct path. >> >> >> Regards, >> >> Dani_L. >> >> >> On 2/24/19 3:09 AM, Gilles Gouaillardet wrote: >> >> Passant, >> >> you have to manually download and apply >> https://github.com/pmix/pmix/commit/2e2f4445b45eac5a3fcbd409c81efe318876e659.patch >> to PMIx 2.2.1 >> that should likely fix your problem. >> >> As a side note, it is a bad practice to configure --with-FOO=/usr >> since it might have some unexpected side effects. >> Instead, you can replace >> >> configure --with-slurm --with-pmix=/usr --with-pmi=/usr --with-libevent=/usr >> >> with >> >> configure --with-slurm --with-pmix=external --with-pmi >> --with-libevent=external >> >> to be on the safe side I also invite you to pass --with-hwloc=external >> to the configure command line >> >> >> Cheers, >> >> Gilles >> >> On Sun, Feb 24, 2019 at 1:54 AM Passant A. Hafez >> wrote: >> >> Hello Gilles, >> >> Here are some details: >> >> Slurm 18.08.4 >> >> PMIx 2.2.1 (as shown in /usr/include/pmix_version.h) >> >> Libevent 2.0.21 >> >> srun --mpi=list >> srun: MPI types are... >> srun: none >> srun: openmpi >> srun: pmi2 >> srun: pmix >> srun: pmix_v2 >> >> Open MPI versions tested: 4.0.0 and 3.1.2 >> >> >> For each installation to be mentioned a different MPI Hello World program >> was compiled. >> Jobs were submitted by sbatch, 2 node * 2 tasks per node then srun >> --mpi=pmix program >> >> File 400ext_2x2.out (attached) is for OMPI 4.0.0 installation with configure >> options: >> --with-slurm --with-pmix=/usr --with-pmi=/usr --with-libevent=/usr >> and configure log: >> Libevent support: external >> PMIx support: External (2x) >> >> File 400int_2x2.out (attached) is for OMPI 4.0.0 installation with configure >> options: >> --with-slurm --with-pmix >> and configure log: >> Libevent support: internal (external libevent version is less that internal >> version 2.0.22) >> PMIx support: Internal >> >> Tested also different installations for 3.1.2 and got errors similar to >> 400ext_2x2.out >> (NOT-SUPPORTED in file event/pmix_event_registration.c at line 101) >> >> >> >> >> >> All the best, >> -- >> Passant A. Hafez | HPC Applications Specialist >> KAUST Supercomputing Core Laboratory (KSL) >> King Abdullah University of Science and Technology >> Building 1, Al-Khawarizmi, Room 0123 >> Mobile : +966 (0) 55-247-9568 >> Mobile : +20 (0) 106-146-9644 >&g
Re: [OMPI users] Building PMIx and Slurm support
Gilles, On 04/03/2019 01:59:28, Gilles Gouaillardet wrote: Daniel, keep in mind PMIx was designed with cross-version compatibility in mind, so a PMIx 3.0.2 client (read Open MPI 4.0.0 app with the internal 3.0.2 PMIx) should be able to interact with a PMIx 3.1.2 server (read SLURM pmix plugin built on top of PMIx 3.1.2). Good to know - I did not find that information and was hesitant to mix and match. So unless you have a specific reason not to mix both, you might also give the internal PMIx a try. Does this hold true for libevent too? Configure complains if libevent for openmpi is different than the one used for the other tools. The 4.0.1 release candidate 1 was released a few days ago, and based on the feedback we receive, the final 4.0.1 should be released in a very near future. Thanks for the info. Cheers, Gilles Cheers, Dani_L On 3/4/2019 1:08 AM, Daniel Letai wrote: Sent from my iPhone On 3 Mar 2019, at 16:31, Gilles Gouaillardet wrote: Daniel, PMIX_MODEX and PMIX_INFO_ARRAY have been removed from PMIx 3.1.2, and Open MPI 4.0.0 was not ready for this. You can either use the internal PMIx (3.0.2), or try 4.0.1rc1 (with the external PMIx 3.1.2) that was published a few days ago. Thanks, will try that tomorrow. I can’t use internal due to Slurm dependency, but I will try the rc. Any idea when 4.0.1 will be released? FWIW, you are right using --with-pmix=external (and not using --with-pmix=/usr) Cheers, Gilles On Sun, Mar 3, 2019 at 10:57 PM Daniel Letai wrote: Hello, I have built the following stack : centos 7.5 (gcc 4.8.5-28, libevent 2.0.21-4) MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tgz built with --all --without-32bit (this includes ucx 1.5.0) hwloc from centos 7.5 : 1.11.8-4.el7 pmix 3.1.2 slurm 18.08.5-2 built --with-ucx --with-pmix openmpi 4.0.0 : configure --with-slurm --with-pmix=external --with-pmi --with-libevent=external --with-hwloc=external --with-knem=/opt/knem-1.1.3.90mlnx1 --with-hcoll=/opt/mellanox/hcoll The configure part succeeds, however 'make' errors out with: ext3x.c: In function 'ext3x_value_unload': ext3x.c:1109:10: error: 'PMIX_MODEX' undeclared (first use in this function) And same for 'PMIX_INFO_ARRAY' However, both are declared in the opal/mca/pmix/pmix3x/pmix/include/pmix_common.h file. opal/mca/pmix/ext3x/ext3x.c does include pmix_common.h but as a system include #include , while ext3x.h includes it as a local include #include "pmix_common". Neither seem to pull from the correct path. Regards, Dani_L. On 2/24/19 3:09 AM, Gilles Gouaillardet wrote: Passant, you have to manually download and apply https://github.com/pmix/pmix/commit/2e2f4445b45eac5a3fcbd409c81efe318876e659.patch to PMIx 2.2.1 that should likely fix your problem. As a side note, it is a bad practice to configure --with-FOO=/usr since it might have some unexpected side effects. Instead, you can replace configure --with-slurm --with-pmix=/usr --with-pmi=/usr --with-libevent=/usr with configure --with-slurm --with-pmix=external --with-pmi --with-libevent=external
Re: [OMPI users] Building PMIx and Slurm support
Gilles, On 3/4/19 8:28 AM, Gilles Gouaillardet wrote: Daniel, On 3/4/2019 3:18 PM, Daniel Letai wrote: So unless you have a specific reason not to mix both, you might also give the internal PMIx a try. Does this hold true for libevent too? Configure complains if libevent for openmpi is different than the one used for the other tools. I am not exactly sure of which scenario you are running. Long story short, - If you use an external PMIx, then you have to use an external libevent (otherwise configure will fail). It must be the same one used by PMIx, but I am not sure configure checks that. - If you use the internal PMIx, then it is up to you. you can either use the internal libevent, or an external one. Thanks, that clarifies the issues I've experienced. Since PMIx doesn't have to be the same for server and nodes, I can compile slurm with external PMIx with system libevent, and compile openmpi with internal PMIx and libevent, and that should work. Is that correct? BTW, building 4.0.1rc1 completed successfully using external for all, will start testing in near future. Cheers, Gilles Thanks, Dani_L. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Building PMIx and Slurm support
Hi, On 12/03/2019 10:46:02, Passant A. Hafez wrote: Hi Gilles, Yes it was just a typo in the last email, it was correctly spelled in the job script. So I just tried to use 1 node * 2 tasks/node, I got the same error I posted before, just a copy for each process, here it is again: *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, ***and potentially your MPI job) [cn603-20-l:169109] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, ***and potentially your MPI job) [cn603-20-l:169108] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! srun: error: cn603-20-l: tasks 0-1: Exited with exit code 1 I'm suspecting Slurm, but anyways, how can I troubleshoot this? Simple - try running directly without Slurm. If it works - Slurm is the culprit. If not - it's MPI debug time. The program is a simple MPI Hello World code. All the best, -- Passant A. Hafez | HPC Applications Specialist KAUST Supercomputing Core Laboratory (KSL) King Abdullah University of Science and Technology Building 1, Al-Khawarizmi, Room 0123 Mobile : +966 (0) 55-247-9568 Mobile : +20 (0) 106-146-9644 Office : +966 (0) 12-808-0367 From: users on behalf of Gilles Gouaillardet Sent: Tuesday, March 12, 2019 8:22 AM To: users@lists.open-mpi.org Subject: Re: [OMPI users] Building PMIx and Slurm support Passant, Except the typo (it should be srun --mpi=pmix_v3), there is nothing wrong with that, and it is working just fine for me (same SLURM version, same PMIx version, same Open MPI version and same Open MPI configure command line) that is why I asked you some more information/logs in order to investigate your issue. You might want to try a single node job first in order to rule out potential interconnect related issues. Cheers, Gilles On 3/12/2019 1:54 PM, Passant A. Hafez wrote: Hello Gilles, Yes I do use srun --mpi=pmix_3 to run the app, what's the problem with that? Before that, when we tried to launch MPI apps directly with srun, we got the error message saying Slurm missed the PMIx support, that's why we proceeded with the installation. All the best, -- Passant On Mar 12, 2019 6:53 AM, Gilles Gouaillardet wrote: Passant, I built a similar environment, and had no issue running a simple MPI program. Can you please post your slurm script (I assume it uses srun to start the MPI app), the output of scontrol show config | grep Mpi and the full output of your job ? Cheers, Gilles On 3/12/2019 7:59 AM, Passant A. Hafez wrote: Hello, So we now have Slurm 18.08.6-2 compiled with PMIx 3.1.2 then I installed openmpi 4.0.0 with: --with-slurm --with-pmix=internal --with-libevent=internal --enable-shared --enable- static --with-x (Following the thread, it was mentioned that building OMPI 4.0.0 with PMIx 3.1.2 will fail with PMIX_MODEX and PMIX_INFO_ARRAY errors, so I used internal PMIx) The MPI program fails with: *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, ***and potentially your MPI job) [cn603-13-r:387088] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! for each process, please advise! what's going wrong here? All the best, -- Passant A. Hafez | HPC Applications Specialist KAUST Supercomputing Core Laboratory (KSL) King Abdullah University of Science and Technology Building 1, Al-Khawarizmi, Room 0123 Mobile : +966 (0) 55-247-9568 Mobile : +20 (0) 106-146-9644 Office : +966 (0) 12-808-0367 *From:* users on behalf of Ralph H Castain *Sent:* Monday, March 4, 2019 5:29 PM *To:* Open MPI Users *Subject:* Re: [OMPI users] Building PMIx and Slurm support On Mar 4, 2019, at 5:34 AM, Daniel Letai <d...@letai.org.il > wrote: Gilles, On 3/4/19 8:28 AM, Gilles Gouaillardet wrote: Daniel, On 3/4/2019 3:18 PM, Daniel Letai wrote: So unless you have a specific reason not to mix both, you might also give the internal PMIx a try. Does this hold true for libevent too? Configure complains if libevent for openmpi is different than t
[OMPI users] Are there any issues (performance or otherwise) building apps with different compiler from the one used to build openmpi?
Hello, Assuming I have installed openmpi built with distro stock gcc(4.4.7 on rhel 6.5), but an app requires a different gcc version (8.2 manually built on dev machine). Would there be any issues, or performance penalty, if building the app using the more recent gcc with flags from wrapper compiler's --showme as per https://www.open-mpi.org/faq/?category=mpi-apps#cant-use-wrappers ? Openmpi is built with both pmix and ucx enabled, all built with stock gcc(4.4.7). Since the constraint is the app, if the answer is yes I would have to build openmpi using non-distro gcc which is a bit of a hassle. Thanks in advance --Dani_L. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Packaging issue with linux spec file when not build_all_in_one_rpm due to empty grep
In src rpm version 4.0.1 if building with --define 'build_all_in_one_rpm 0' the grep -v _mandir docs.files is empty. The simple workaround is to follow earlier pattern and pipe to /bin/true, as the spec doesn't really care if the file is empty. I'm wondering if not all greps should be protected. A simple patch: diff --git a/contrib/dist/linux/openmpi.spec b/contrib/dist/linux/openmpi.spec index 2a80af296b..2b897345f9 100644 --- a/contrib/dist/linux/openmpi.spec +++ b/contrib/dist/linux/openmpi.spec @@ -611,7 +611,7 @@ grep -v %{_includedir} devel.files > tmp.files mv tmp.files devel.files # docs sub package -grep -v %{_mandir} docs.files > tmp.files +grep -v %{_mandir} docs.files > tmp.files | /bin/true mv tmp.files docs.files %endif ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Can't start jobs with srun.
I know it's not supposed to matter, but have you tried building both ompi and slurm against same pmix? That is - first build pmix, than build slurm with-pmix, and than ompi with both slurm and pmix=external ? On 23/04/2020 17:00, Prentice Bisbal via users wrote: $ ompi_info | grep slurm Configure command line: '--prefix=/usr/pppl/intel/2019-pkgs/openmpi-4.0.3' '--disable-silent-rules' '--enable-shared' '--with-pmix=internal' '--with-slurm' '--with-psm' MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.0.3) MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.3) MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.3) MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.0.3) Any ideas what could be wrong? Do you need any additional information? Prentice