[OMPI users] mca_btl_tcp_frag_send: writev failed with errno=110
I am getting the following error with openmpi-1.1b1 mca_btl_tcp_frag_send: writev failed with errno=110 1) This does not ever happen with other MPI's I have tried like MPICH and LAM 2) It only seems to happen with large numbers of cpus, 32 and occasionally 16, and with larger messages sizes. In this case it ws 128K. 3) It only seems to happen with dual cpus on each node. 4) My configuration is default with (in openmpi-mca-params.conf): pls_rsh_agent = rsh btl = tcp,self btl_tcp_if_include = eth1 I also set --mca btl_tcp_eager_limit 131072 when running the program, though leaving this out does not eliminate the problem. My program is a communication test; it sends bidirectional point to point messages among N cpus. In one test it exchanges messages between pairs of cpus, in another it reads from the node on its left and sends to the node on its right (a kind of ring), and in a third it uses MPI_ALL_REDUCE. Finally: the tcp driver in openmpi seems not nearly as good as the one in LAM. I got higher throughput with far fewer dropouts with LAM. Tony --- Tony Ladd Professor, Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu
Re: [OMPI users] mca_btl_tcp_frag_send: writev failed with errno=110
Jeff Thanks for the reply; I realize you guys must be really busy with the recent release of openmpi. I tried 1.1 and I don't get error messages any more. But the code now hangs; no error or exit. So I am not sure if this is the same issue or something else. I am enclosing my source code. I compiled with icc and linked against an icc compiled version of openmpi-1.1. My program is a set of network benchmarks (a crude kind of netpipe) that checks typical message passing patterns in my application codes. Typical output is: 32 CPU's: sync call time = 1003.0time rate (Mbytes/s) bandwidth (MBits/s) loop buffers size XC XE GS MS XC XE GS MS XC XE GS MS 1 6416384 2.48e-02 1.99e-02 1.21e+00 3.88e-02 4.23e+01 5.28e+01 8.65e-01 2.70e+01 1.08e+04 1.35e+04 4.43e+02 1.38e+04 2 6416384 2.17e-02 2.09e-02 1.21e+00 4.10e-02 4.82e+01 5.02e+01 8.65e-01 2.56e+01 1.23e+04 1.29e+04 4.43e+02 1.31e+04 3 6416384 2.20e-02 1.99e-02 1.01e+00 3.95e-02 4.77e+01 5.27e+01 1.04e+00 2.65e+01 1.22e+04 1.35e+04 5.33e+02 1.36e+04 4 6416384 2.16e-02 1.96e-02 1.25e+00 4.00e-02 4.85e+01 5.36e+01 8.37e-01 2.62e+01 1.24e+04 1.37e+04 4.28e+02 1.34e+04 5 6416384 2.25e-02 2.00e-02 1.25e+00 4.07e-02 4.66e+01 5.24e+01 8.39e-01 2.57e+01 1.19e+04 1.34e+04 4.30e+02 1.32e+04 6 6416384 2.19e-02 1.99e-02 1.29e+00 4.05e-02 4.79e+01 5.28e+01 8.14e-01 2.59e+01 1.23e+04 1.35e+04 4.17e+02 1.33e+04 7 6416384 2.19e-02 2.06e-02 1.25e+00 4.03e-02 4.79e+01 5.09e+01 8.38e-01 2.60e+01 1.23e+04 1.30e+04 4.29e+02 1.33e+04 8 6416384 2.24e-02 2.06e-02 1.25e+00 4.01e-02 4.69e+01 5.09e+01 8.39e-01 2.62e+01 1.20e+04 1.30e+04 4.30e+02 1.34e+04 9 6416384 4.29e-01 2.01e-02 6.35e-01 3.98e-02 2.45e+00 5.22e+01 1.65e+00 2.64e+01 6.26e+02 1.34e+04 8.46e+02 1.35e+04 10 6416384 2.16e-02 2.06e-02 8.87e-01 4.00e-02 4.85e+01 5.09e+01 1.18e+00 2.62e+01 1.24e+04 1.30e+04 6.05e+02 1.34e+04 Time is total for all 64 buffers. Rate is one way across one link (# of bytes/time). 1) XC is a bidirectional ring exchange. Each processor sends to the right and receives from the left 2) XE is an edge exchange. Pairs of nodes exchange data, with each one sending and receiving 3) GS is the MPI_AllReduce 4) MS is my version of MPI_AllReduce. It splits the vector into Np blocks (Np is # of processors); each processor then acts as a head node for one block. This uses the full bandwidth all the time, unlike AllReduce which thins out as it gets to the top of the binary tree. On a 64 node Infiniband system MS is about 5X faster than GS-in theory it would be 6X; ie log_2(64). Here it is 25X-not sure why so much. But MS seems to be the cause of the hangups with messages > 64K. I can run the other benchmarks OK,but this one seems to hang for large messages. I think the problem is at least partly due to the switch. All MS is doing is point to point communications, but unfortunately it sometimes requires a high bandwidth between ASIC's. It first it exchanges data between near neighbors in MPI_COMM_WORLD, but it must progressively span wider gaps between nodes as it goes up the various binary trees. After a while this requires extensive traffic between ASICS. This seems to be a problem on both my HP 2724 and the Extreme Networks Summit400t-48. I am currently working with Extreme to try to resolve the switch issue. As I say; the code ran great on Infiniband, but I think those switches have hardware flow control. Finally I checked the code again under LAM and it ran OK. Slow, but no hangs. To run the code compile and type: mpirun -np 32 -machinefile hosts src/netbench 8 The 8 means 2^8 bytes (ie 256K). This was enough to hang every time on my boxes. You can also edit the header file (header.h). MAX_LOOPS is how many times it runs each test (currently 10); NUM_BUF is the number of buffers in each test (must be more than number of processors), SYNC defines the global sync frequency-every SYNC buffers. NUM_SYNC is the number of sequential barrier calls it uses to determine the mean barrier call time. You can also switch the verious tests on and off, which can be useful for debugging Tony --- Tony Ladd Professor, Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu src.tgz Description: application/compressed
[OMPI users] dual Gigabit ethernet support
A couple of comments regarding issues raised by this thread. 1) In my opinion Netpipe is not such a great network benchmarking tool for HPC applications. It measures timings based on the completion of the send call on the transmitter not the completion of the receive. Thus, if there is a delay in copying the send buffer across the net, it will report a misleading timing compared with the wall-clock time. This is particularly problematic with multiple pairs of edge exchanges, which can oversubscribe most GigE switches. Here the netpipe timings can be off by orders of magnitude compared with the wall clock. The good thing about writing your own code is that you know what it has done (of course no one else knows, which can be a problem). But it seems many people are unaware of the timing issue in Netpipe. 2) Its worth distinguishing between ethernet and TCP/IP. With MPIGAMMA, the Intel Pro 1000 NIC has a latency of 12 microsecs including the switch and a duplex bandwidth of 220 MBytes/sec. With the Extreme Networks X450a-48t switch we can sustain 220MBytes/sec over 48 ports at once. This is not IB performance but it seems sufficient to scale a number of applications to the 100 cpu level, and perhaps beyond. Tony --- Tony Ladd Professor, Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu
[OMPI users] Dual Gigabit ethernet support
Lisandro I use my own network testing program; I wrote it some time ago because Netpipe only tested 1-way rates at that point. I havent tried IMB but I looked at the source and its very similar to what I do. 1) set up buffers with data. 2) Start clock 3) Call MPI_xxx N times 4) Stop clock 5) calculate rate. IMB tests more things than I do; I just focused on the calls I use (send recv allreduce). I have done a lot of testing of hardware and software. I will have some web pages posted soon. I will put a note here when I do. But a couple of things. A) I have found the switch is the biggest discriminant if you want to run HPC under Gigabit ethernet. Most GigE switches choke when all the ports are being used at once. This is the usual HPC pattern, but not of a typical network, which is what these switches are geared towards. The one exception I have found is the Extreme Networks x450a-48t. In some test patterns I found it to be 500 times faster (not a typo) than the s400-48t, which is its predecessor. I have tested several GigE switches (Extreme, Force10, HP, Asante) and the x450 is the only one that copes with high traffic loads in all port configurations. Its expensive for a GigE switch (~$6500) but worth it in my opinion if you want to do HPC. Its still much cheaper than Infiniband. B) You have to test the switch in different port configurations-a random ring of SendRecv is good for this. I don't think IMB has it in its test suite but its easy to program. Or you can change the order of nodes in the machinefile to force unfavorable port assignments. A step of 12 is a good test since many GigE switches use 12-port ASICS and this forces all the traffic onto the backplane. On the Summit 400 this causes it to more or less stop working-rates drop to a few Kbytes/sec along each wire, but the x450 has no problem with the same test. You need to know how your nodes are wired to the switch to do this test. C) GAMMA is an extraordinary accomplishment in my view; in a number of tests with codes like DLPOLY, GROMACS, VASP it can be 2-3 times the speed of TCP based programs with 64 cpus. In many instances I get comparable (and occasionally better) scaling than with the university HPC system which has an Infiniband interconnect. Note I am not saying GigE is comparable to IB; but that a typical HPC setup with nodes scattered all over a fat tree topology (including oversubscription of the links and switches) is enough of a minus that an optimized GigE set up can compete; at least up to 48 nodes (96 cpus in our case). I have worked with Giuseppe Ciaccio for the past 9 months eradicating some obscure bugs in GAMMA. I find them; he fixes them. We have GAMMA running on 48 nodes quite reliably but there are still many issues to address. GAMMA is very much a research tool-there are a number of features(?) which would hinder it being used in an HPC environment. Basically Giuseppe needs help with development. Any volunteers? Tony --- Tony Ladd Professor, Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu
[OMPI users] Dual Gigabit Ethernet Support
Durga I guess we have strayed a bit from the original post. My personal opinion is that a number of codes can run in HPC-like mode over Gigabit ethernet, not just the trivially parallelizable. The hardware components are one key; PCI-X, low hardware latency NIC (Intel PRO 1000 is 6.6 microsecs vs about 14 for the Bcom 5721), and a non-blocking (that's the key word) switch. Then you need a good driver and a good MPI software layer. At present MPICH is ahead of LAM/OpenMPI/MVAPICH in its implementation of optimized collectives. At least that's how it seems to me (let me say that quickly, before I get flamed). MPICH got a bad rap performance wise because its TCP driver was mediocre (compared with LAM and OpenMPI). But MPICH + GAMMA is very fast. MPIGAMMA even beats out our Infiniband cluster running OpenMPI on the MPI_Allreduce; the test was with 64 cpus-32 nodes on the GAMMA cluster (Dual core P4) and 16 nodes on the Infiniband (Dual Dual-core Opterons). The IB cluster worked out at 24MBytes/sec (vector size/time) and the GigE + MPIGAMMA was 39MBytes/sec. On the other hand, if I use my own optimized AllReduce (a simplified version of the one in MPICH) on the IB cluster it gets 108MByte/sec. So the tricky thing is all the components need to be in place to get good application performance. GAMMA is not so easy to set up-I had considerable help from Giuseppe. It has libraries to compile and the kernel needs to be recompiled. Once I got that automated I can build and install a new version of GAMMA in about 5 mins. The MPIGAMMA build is just like MPICH and MPIGAMMA works almost exactly the same. So any application that will compile under MPICH should compile under MPIGAMMA, just by changing the path. I have run half a dozen apps with GAMMA. Netpipe, Netbench (my network tester-a simplified version of IMB), Susp3D (my own code-a CFD like application), DLPOLY all compile out of the box. Gromacs compiles but has a couple of "bugs" that crash on execution. One is an archaic test for MPICH that prevents a clean exit-must have been a bugfix for an earlier version of MPICH. The other seems to be an fclose of an unassigned file pointer. It works OK in LAM but my guess is its illegal strictly speaking. A student was supposed to check on that. VASP also compiles out of the box if you can compile it with MPICH. But there is a problem with the MPIGAMMA and the MPI_Alltoall function right now. It works but it suffers from hangups and long delays. So GAMMA is not good for VASP at this moment. You see the substantial performance improvements sometimes, but other times its dreadfully slow. I can reproduce the problem with an AlltoAll test code and Giuseppe is going to try to debug the problem. So GAMMA is not a pancea. In most circumstances it is stable and predictable; much more reproducble than MPI over TCP. But there are still may be one or two bugs and several issues. 1) Since GAMMA is tightly entwined in the kernel a crash frequently brings the whole system down, which is a bit annoying; also it can crash other nodes in the same GAMMA Virtual Machine. 2) NIC's are very buggy hardware-if you look at a TCP driver there are a large number of hardware bugfixes in them. A number of GAMMA problems can be traced to this. It's a lot of work to reprogram all the workarounds. 3) GAMMA nodes have to be preconfigured at boot. You can run more than one job on a GAMMA virtual machine, but it's a little iffy; there can be interactions between nodes on the same VM even if they are running different jobs. Different GAMMA VM's need a different VLAN. So a multiuser environment is still problematic. 4) Giuseppe said MPIGAMMA was a very difficult code to write-so I would guess a port to OpenMPI would not be trivial. Also I would want to see optimized collectives in OpenMPI before I switched from MPICH As far as I know GAMMA is the most advanced non TCP protocol. At core it really works well, but it still needs a lot more testing and development. Giuseppe is great to work with if anyone out there is interested. Go to the MPIGAMMA website for more info http://www.disi.unige.it/project/gamma/mpigamma/index.html. Tony
[OMPI users] OMPI collectives
1) I think OpenMPI does not use optimal algorithms for collectives. But neither does LAM. For example the MPI_Allreduce scales as log_2 N where N is the number of processors. MPICH uses optimized collectives and the MPI_Allreduce is essentially independent of N. Unfortunately MPICH has never had a good TCP interface so its typically slower overall than LAM or OpenMPI. Are there plans to develop optimized collectives for OpenMPI; if so, is there a timeline 2) I have found an additional problem in OpenMPI over TCP. MPI_AllReduce can run extremely slowly on large numbers of processors. Measuring throughput (message size / time) for 48 nodes with 16KByte messages (for example) I get only 0.12MBytes/sec. The same code with LAM gets 5.3MBytes/sec which is more reasonable. The problem seems to arise for a) more than 16 nodes and b) message sizes in the range 16-32KBytes. Normally this is the optimum size so its odd. Other message sizes are closer to LAM (though typically a little slower). I have run these tests with my own network test, but I can run IMB if necessary. Tony
[OMPI users] OMPI Collectives
George Thanks for the info. When you say "highly optimized" do you algorithmically, tuning, or both? In particular I wonder if OMPI optimized collectives use the divide and conquer strategy to maximize network bandwidth. Sorry to be dense but I could not find documanetation on how to access the optimized collectives. I would be willing to fiddle with the parameters a bit by hand if I had some guidance as to how to set things and what I might vary. The optimization I was talking about (divide and conquer) would work better than the basic strategy regardless of network; only message size might have some effect. Thanks Tony --- Tony Ladd Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu
[OMPI users] OMPI Collectives
George Thanks for the references. However, I was not able to figure out if it what I am asking is so trivial it is simply passed over or so subtle that its been overlooked (I suspect the former). The binary tree algorithm in MPI_Allreduce takes a tiume proportional to 2*N*log_2M where N is the vector length and M is the number of processes. There is a divide and conquer strategy (http://www.hlrs.de/organization/par/services/models/mpi/myreduce.html) that mpich uses to do a MPI_Reduce in a time proportional to N. Is this algorithm or something equivalent in OpenMPI at present? If so how do I turn it on? I also found that OpenMPI is sometimes very slow on MPI_Allreduce using TCP. Things are OK up to 16 processes but at 24 the rates (Message length divided by time) are as follows: Message size (Kbytes) Throughput (Mbytes/sec) M=24M=32M=48 1 1.381.301.09 2 2.281.941.50 4 2.922.351.73 8 3.562.811.99 16 3.971.940.12 32 0.340.240.13 64 3.072.331.57 128 3.702.801.89 256 4.103.102.08 512 4.193.282.08 10244.363.362.17 Around 16-32KBytes there is a pronouced slowdown-roughly a factor of 10, which seems too much. Any idea whats going on? Tony --- Tony Ladd Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu
[OMPI users] OMPI collectives
George I found the info I think you were referring to. Thanks. I then experimented essentially randomly with different algorithms for all reduce. But the issue with really bad performance for certain message sizes persisted with v1.1. The good news is that the upgrade to 1.2 fixed my worst problem. Now the performance is reasonable for all message sizes. I will test the tuned algorithms again asap. I had a couple of questions 1) Ompi_info lists only 3 or 4 algorithms for allreduce and reduce and about 5 for b'cast. But you can use higher numbers as well. Are these additional undocmented algorithms (you mentioned a number like 15) or is it ignoring out of range parameters? 2) It seems for allreduce you can select a tuned reduce and tuned bcast instead of the binary tree. But there is a faster allreduce which is order 2N rather than 4N for Reduce + Bcast (N is msg size). It segments the vector and distributes the root among the nodes; in an allreduce there is no need to gather the root vector to one processor and then scatter it again. I wrote a simple version for powers of 2 (MPI_SUM)-any chance of it being implemented in OMPI. Tony
[OMPI users] Parallel application performance tests
I have recently completed a number of performance tests on a Beowulf cluster, using up to 48 dual-core P4D nodes, connected by an Extreme Networks Gigabit edge switch. The tests consist of single and multi-node application benchmarks, including DLPOLY, GROMACS, and VASP, as well as specific tests of network cards and switches. I used TCP sockets with OpenMPI v1.2 and MPI/GAMMA over Gigabit ethernet. MPI/GAMMA leads to significantly better scaling than OpenMPI/TCP in both network tests and in application benchmarks. The overall performance of the MPI/GAMMA cluster on a per cpu basis was found to be comparable to a dual-core Opteron cluster with an Infiniband interconnect. The DLPoly benchmark showed similar scaling to those reported for an IBM p690. The performance using TCP was typically a factor of 2 less in these same tests. A detailed write up can be found at: http://ladd.che.ufl.edu/research/beoclus/beoclus.htm Tony Ladd Chemical Engineering University of Florida
[OMPI users] Problem in starting openmpi job - no output just hangs
I would very much appreciate some advice in how to debug this problem. I am trying to get OpenMPI to work on my reconfigured cluster - upgrading from Centos 5 to Ubuntu 18. The problem is that a simple job using Intel's IMB message passing test code will not run on any of the new clients (4 so far). mpirun -np 2 IMB-MPI1 just hangs - no printout, no messages in syslog. I left it for 1 hr and it remained in the same state. On the other hand the same code runs fine on the server (see outfoam). Comparing the two it seems the client version hangs while trying to load the openib module (it works with tcp,self or vader,self). Digging a bit more I found the --mca btl_base_verbose option. Now I can see a difference in the two cases: On the server: ibv_obj->logical_index=1, my_obj->logical_index=0 On the client: ibv_obj->type set to NULL. I don't believe this is a good sign, but I don't understand what it means. My guess is that openib is not being initialized in this case. The server (foam) is SuperMicro server with X10DAi m'board and 2XE52630 (10 core). The client (f34) is a Dell R410 server with 2XE5620 (4 core). The outputs from ompi_info are attached. They are both running Ubuntu 18.04 with the latest updates. I installed openmpi-bin 2.1.1-8. Both boxes have Mellanox Connect X2 cards with the latest firmware (2.9.1000). I have checked that the cards send and receive packets using the IB protocols and pass the Mellanox diagnostics. I did notice that the Mellanox card has the PCI address 81:00.0 on the server but 03:00.0 on the client. Not sure of the significance of this. Any help anyone can offer would be much appreciated. I am stuck. Thanks Tony -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel: (352)-392-6509 FAX: (352)-392-9514 Package: Open MPI buildd@lcy01-amd64-009 Distribution Open MPI: 2.1.1 Open MPI repo revision: v2.1.0-100-ga2fdb5b Open MPI release date: May 10, 2017 Open RTE: 2.1.1 Open RTE repo revision: v2.1.0-100-ga2fdb5b Open RTE release date: May 10, 2017 OPAL: 2.1.1 OPAL repo revision: v2.1.0-100-ga2fdb5b OPAL release date: May 10, 2017 MPI API: 3.1.0 Ident string: 2.1.1 Prefix: /usr Configured architecture: x86_64-pc-linux-gnu Configure host: lcy01-amd64-009 Configured by: buildd Configured on: Mon Feb 5 19:59:59 UTC 2018 Configure host: lcy01-amd64-009 Built by: buildd Built on: Mon Feb 5 20:05:56 UTC 2018 Built host: lcy01-amd64-009 C bindings: yes C++ bindings: yes Fort mpif.h: yes (all) Fort use mpi: yes (full: ignore TKR) Fort use mpi size: deprecated-ompi-info-value Fort use mpi_f08: yes Fort mpi_f08 compliance: The mpi_f08 module is available, but due to limitations in the gfortran compiler, does not support the following: array subsections, direct passthru (where possible) to underlying Open MPI's C functionality Fort mpi_f08 subarrays: no Java bindings: yes Wrapper compiler rpath: disabled C compiler: gcc C compiler absolute: /usr/bin/gcc C compiler family name: GNU C compiler version: 7.3.0 C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fort compiler: gfortran Fort compiler abs: /usr/bin/gfortran Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::) Fort 08 assumed shape: yes Fort optional args: yes Fort INTERFACE: yes Fort ISO_FORTRAN_ENV: yes Fort STORAGE_SIZE: yes Fort BIND(C) (all): yes Fort ISO_C_BINDING: yes Fort SUBROUTINE BIND(C): yes Fort TYPE,BIND(C): yes Fort T,BIND(C,name="a"): yes Fort PRIVATE: yes Fort PROTECTED: yes Fort ABSTRACT: yes Fort ASYNCHRONOUS: yes Fort PROCEDURE: yes Fort USE...ONLY: yes Fort C_FUNLOC: yes Fort f08 using wrappers: yes Fort MPI_SIZEOF: yes C profiling: yes C++ profiling: yes Fort mpif.h profiling: yes Fort use mpi profiling: yes Fort use mpi_f08 prof: yes C++ exceptions: no Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes) Sparse Groups: no Internal debug support: no MPI interface warnings: yes MPI parameter check: runtime Memory profiling support: no Memory debugging support: no dl support: yes Heterogeneous support: yes mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: native Symbol vis. support: yes Host to
Re: [OMPI users] Problem in starting openmpi job - no output just hangs
My apologies - I did not read the FAQ's carefully enough - with regard to 14: 1. openib 2. Ubuntu supplied drivers etc. 3. Ubuntu 18.04 4.15.0-112-generic 4. opensm-3.3.5_mlnx-0.1.g6b18e73 5. Attached 6. Attached 7. unlimited on foam and 16384 on f34 I changed the ulimit to unlimited on f34 but it did not help. Tony -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel: (352)-392-6509 FAX: (352)-392-9514 foam:root(ib)> ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.9.1000 node_guid: 0002:c903:000f:666e sys_image_guid: 0002:c903:000f:6671 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: MT_0D90110009 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 26 port_lmc: 0x00 link_layer: InfiniBand root@f34:/home/tladd# ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.9.1000 node_guid: 0002:c903:000a:af92 sys_image_guid: 0002:c903:000a:af95 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: MT_0D90110009 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 32 port_lmc: 0x00 link_layer: InfiniBand root@f34:/home/tladd# ifconfig eno1: flags=4163 mtu 1500 inet 10.1.2.34 netmask 255.255.255.0 broadcast 10.1.2.255 inet6 fe80::862b:2bff:fe18:3729 prefixlen 64 scopeid 0x20 ether 84:2b:2b:18:37:29 txqueuelen 1000 (Ethernet) RX packets 1015244 bytes 146716710 (146.7 MB) RX errors 0 dropped 234903 overruns 0 frame 0 TX packets 176298 bytes 17106041 (17.1 MB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ib0: flags=4163 mtu 2044 inet 10.2.2.34 netmask 255.255.255.0 broadcast 10.2.2.255 inet6 fe80::202:c903:a:af93 prefixlen 64 scopeid 0x20 unspec 80-00-02-08-FE-80-00-00-00-00-00-00-00-00-00-00 txqueuelen 256 (UNSPEC) RX packets 289257 bytes 333876570 (333.8 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 140385 bytes 324882131 (324.8 MB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 317853 bytes 21490738 (21.4 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 317853 bytes 21490738 (21.4 MB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 foam:root(ib)> ifconfig enp4s0: flags=4163 mtu 1500 inet 10.1.2.251 netmask 255.255.255.0 broadcast 10.1.2.255 inet6 fe80::ae1f:6bff:feb1:7f02 prefixlen 64 scopeid 0x20 ether ac:1f:6b:b1:7f:02 txqueuelen 1000 (Ethernet) RX packets 1092343 bytes 98282221 (98.2 MB) RX errors 0 dropped 176607 overruns 0 frame 0 TX packets 248746 bytes 206951391 (206.9 MB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0xf040-f047 enp5s0: flags=4163 mtu 1500 inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::ae1f:6bff:feb1:7f03 prefixlen 64 scopeid 0x20 ether ac:1f:6b:b1:7f:03 txqueuelen 1000 (Ethernet) RX packets 1039387 bytes 87199457 (87.1 MB) RX errors 0 dropped 187625 overruns 0 frame 0 TX packets 5884980 bytes 8649612519 (8.6 GB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0xf030-f037 enp6s0: flags=4163 mtu 1500 inet 10.227.121.95 netmask 255.255.255.0 broadcast 10.227.121.255 inet6 fe80::6a05:caff:febd:397c prefixlen 64 scopeid 0x20 ether 68:05:ca:bd:3
Re: [OMPI users] Problem in starting openmpi job - no output just hangs
One other update. I compiled OpenMPI-4.0.4 The outcome was the same but there is no mention of ibv_obj this time. Tony -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel: (352)-392-6509 FAX: (352)-392-9514 f34:tladd(~)> mpirun -d --report-bindings --mca btl_openib_allow_ib 1 --mca btl openib,self --mca btl_base_verbose 30 -np 2 mpi-benchmarks-IMB-v2019.3/src_c/IMB-MPI1 SendRecv [f34:24079] procdir: /tmp/ompi.f34.501/pid.24079/0/0 [f34:24079] jobdir: /tmp/ompi.f34.501/pid.24079/0 [f34:24079] top: /tmp/ompi.f34.501/pid.24079 [f34:24079] top: /tmp/ompi.f34.501 [f34:24079] tmp: /tmp [f34:24079] sess_dir_cleanup: job session dir does not exist [f34:24079] sess_dir_cleanup: top session dir not empty - leaving [f34:24079] procdir: /tmp/ompi.f34.501/pid.24079/0/0 [f34:24079] jobdir: /tmp/ompi.f34.501/pid.24079/0 [f34:24079] top: /tmp/ompi.f34.501/pid.24079 [f34:24079] top: /tmp/ompi.f34.501 [f34:24079] tmp: /tmp [f34:24079] [[62672,0],0] Releasing job data for [INVALID] [f34:24079] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.][./././.] [f34:24079] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.][./././.] MPIR_being_debugged = 0 MPIR_debug_state = 1 MPIR_partial_attach_ok = 1 MPIR_i_am_starter = 0 MPIR_forward_output = 0 MPIR_proctable_size = 2 MPIR_proctable: (i, host, exe, pid) = (0, f34, /home/tladd/mpi-benchmarks-IMB-v2019.3/src_c/IMB-MPI1, 24083) (i, host, exe, pid) = (1, f34, /home/tladd/mpi-benchmarks-IMB-v2019.3/src_c/IMB-MPI1, 24084) MPIR_executable_path: NULL MPIR_server_arguments: NULL [f34:24084] procdir: /tmp/ompi.f34.501/pid.24079/1/1 [f34:24084] jobdir: /tmp/ompi.f34.501/pid.24079/1 [f34:24084] top: /tmp/ompi.f34.501/pid.24079 [f34:24084] top: /tmp/ompi.f34.501 [f34:24084] tmp: /tmp [f34:24083] procdir: /tmp/ompi.f34.501/pid.24079/1/0 [f34:24083] jobdir: /tmp/ompi.f34.501/pid.24079/1 [f34:24083] top: /tmp/ompi.f34.501/pid.24079 [f34:24083] top: /tmp/ompi.f34.501 [f34:24083] tmp: /tmp [f34:24084] mca: base: components_register: registering framework btl components [f34:24084] mca: base: components_register: found loaded component self [f34:24084] mca: base: components_register: component self register function successful [f34:24084] mca: base: components_register: found loaded component openib [f34:24084] mca: base: components_register: component openib register function successful [f34:24084] mca: base: components_open: opening btl components [f34:24084] mca: base: components_open: found loaded component self [f34:24084] mca: base: components_open: component self open function successful [f34:24084] mca: base: components_open: found loaded component openib [f34:24084] mca: base: components_open: component openib open function successful [f34:24084] select: initializing btl component self [f34:24084] select: init of component self returned success [f34:24084] select: initializing btl component openib [f34:24083] mca: base: components_register: registering framework btl components [f34:24083] mca: base: components_register: found loaded component self [f34:24083] mca: base: components_register: component self register function successful [f34:24083] mca: base: components_register: found loaded component openib [f34:24083] mca: base: components_register: component openib register function successful [f34:24083] mca: base: components_open: opening btl components [f34:24083] mca: base: components_open: found loaded component self [f34:24083] mca: base: components_open: component self open function successful [f34:24083] mca: base: components_open: found loaded component openib [f34:24083] mca: base: components_open: component openib open function successful [f34:24083] select: initializing btl component self [f34:24083] select: init of component self returned success [f34:24083] select: initializing btl component openib [f34:24084] Checking distance from this process to device=mlx4_0 [f34:24084] Process is not bound: distance to device is 0.00 [f34:24083] Checking distance from this process to device=mlx4_0 [f34:24083] Process is not bound: distance to device is 0.00 [f34:24083] [rank=0] openib: using port mlx4_0:1 [f34:24083] select: init of component openib returned success [f34:24084] [rank=1] openib: using port mlx4_0:1 [f34:24084] select: init of component openib returned success [f34:24083] mca: bml: Using self btl for send to [[62672,1],0] on node f34 [f34:24084] mca: bml: Using self btl for send to [[62672,1],1] on node f34 ^C [f34:24079] sess_dir_finalize: proc session dir does not exist [f34:24079] sess_dir_finalize: job session dir does not exist [f34:24079] sess_dir_finalize: jobfam session dir not empty - leaving [f34:24079] sess_dir_finalize: jobfam session dir not empty - leaving [f34:24079] sess_dir_finalize: top session dir not empty - leaving [f34:24079] sess_dir_finalize: proc session
Re: [OMPI users] Problem in starting openmpi job - no output just hangs
Hi Jeff I installed ucx as you suggested. But I can't get even the simplest code (ucp_client_server) to work across the network. I can compile openMPI with UCX but it has the same problem - mpi codes will not execute and there are no messages. Really, UCX is not helping. It is adding another (not so well documented) software layer, which does not offer better diagnostics as far as I can see. Its also unclear to me how to control what drivers are being loaded - UCX wants to make that decision for you. With openMPI I can see that (for instance) the tcp module works both locally and over the network - it must be using the Mellanox NIC for the bandwidth it is reporting on IMB-MPI1 even with tcp protocols. But if I try to use openib (or allow ucx or openmpi to choose the transport layer) it just hangs. Annoyingly I have this server where everything works just fine - I can run locally over openib and its fine. All the other nodes cannot seem to load openib so even local jobs fail. The only good (as best I can tell) diagnostic is from openMPI. ibv_obj (from v2.x) complains that openib returns a NULL object, whereas on my server it returns logical_index=1. Can we not try to diagnose the problem with openib not loading (see my original post for details). I am pretty sure if we can that would fix the problem. Thanks Tony PS I tried configuring two nodes back to back to see if it was a switch issue, but the result was the same. On 8/19/20 1:27 PM, Jeff Squyres (jsquyres) wrote: [External Email] Tony -- Have you tried compiling Open MPI with UCX support? This is Mellanox (NVIDIA's) preferred mechanism for InfiniBand support these days -- the openib BTL is legacy. You can run: mpirun --mca pml ucx ... On Aug 19, 2020, at 12:46 PM, Tony Ladd via users wrote: One other update. I compiled OpenMPI-4.0.4 The outcome was the same but there is no mention of ibv_obj this time. Tony -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel: (352)-392-6509 FAX: (352)-392-9514 -- Jeff Squyres jsquy...@cisco.com -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel: (352)-392-6509 FAX: (352)-392-9514
Re: [OMPI users] Problem in starting openmpi job - no output just hangs
Hi John Thanks for the response. I have run all those diagnostics, and as best I can tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients + server) and the fabric passes all the tests. There is 1 warning: I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps but according to a number of sources this is harmless. I have run Mellanox's P2P performance tests (ib_write_bw) between different pairs of nodes and it reports 3.22 GB/sec which is reasonable (its PCIe 2 x8 interface ie 4 GB/s). I have also configured 2 nodes back to back to check that the switch is not the problem - it makes no difference. I have been playing with the btl params with openMPI (v. 2.1.1 which is what is relelased in Ubuntu 18.04). So with tcp as the transport layer everything works fine - 1 node or 2 node communication - I have tested up to 16 processes (8+8) and it seems fine. Of course the latency is much higher on the tcp interface, so I would still like to access the RDMA layer. But unless I exclude the openib module, it always hangs. Same with OpenMPI v4 compiled from source. I think an important component is that Mellanox is not supporting Connect X2 for some time. This is really infuriating; a $500 network card with no supported drivers, but that is business for you I suppose. I have 50 NICS and I can't afford to replace them all. The other component is the MLNX-OFED is tied to specific software versions, so I can't just run an older set of drivers. I have not seen source files for the Mellanox drivers - I would take a crack at compiling them if I did. In the past I have used the OFED drivers (on Centos 5) with no problem, but I don't think this is an option now. Ubuntu claims to support Connect X2 with their drivers (Mellanox confirms this), but of course this is community support and the number of cases is obviously small. I use the Ubuntu drivers right now because the OFED install seems broken and there is no help with it. Its not supported! Neat huh? The only handle I have is with openmpi v. 2 when there is a message (see my original post) that ibv_obj returns a NULL result. But I don't understand the significance of the message (if any). I am not enthused about UCX - the documentation has several obvious typos in it, which is not encouraging when you a floundering. I know its a newish project but I have used openib for 10+ years and its never had a problem until now. I think this is not so much openib as the software below. One other thing I should say is that if I run any recent version of mstflint is always complains: Failed to identify the device - Can not create SignatureManager! Going back to my original OFED 1.5 this did not happen, but they are at v5 now. Everything else works as far as I can see. But I could not burn new firmware except by going back to the 1.5 OS. Perhaps this is connected with the obv_obj = NULL result. Thanks for helping out. As you can see I am rather stuck. Best Tony On 8/23/20 3:01 AM, John Hearns via users wrote: *[External Email]* Tony, start at a low level. Is the Infiniband fabric healthy? Run ibstatus on every node sminfo on one node ibdiagnet on one node On Sun, 23 Aug 2020 at 05:02, Tony Ladd via users mailto:users@lists.open-mpi.org>> wrote: Hi Jeff I installed ucx as you suggested. But I can't get even the simplest code (ucp_client_server) to work across the network. I can compile openMPI with UCX but it has the same problem - mpi codes will not execute and there are no messages. Really, UCX is not helping. It is adding another (not so well documented) software layer, which does not offer better diagnostics as far as I can see. Its also unclear to me how to control what drivers are being loaded - UCX wants to make that decision for you. With openMPI I can see that (for instance) the tcp module works both locally and over the network - it must be using the Mellanox NIC for the bandwidth it is reporting on IMB-MPI1 even with tcp protocols. But if I try to use openib (or allow ucx or openmpi to choose the transport layer) it just hangs. Annoyingly I have this server where everything works just fine - I can run locally over openib and its fine. All the other nodes cannot seem to load openib so even local jobs fail. The only good (as best I can tell) diagnostic is from openMPI. ibv_obj (from v2.x) complains that openib returns a NULL object, whereas on my server it returns logical_index=1. Can we not try to diagnose the problem with openib not loading (see my original post for details). I am pretty sure if we can that would fix the problem. Thanks Tony PS I tried configuring two nodes back to back to see if it was a switch issue,
Re: [OMPI users] Problem in starting openmpi job - no output just hangs
Hi Jeff I appreciate your help (and John's as well). At this point I don't think is an OMPI problem - my mistake. I think the communication with RDMA is somehow disabled (perhaps its the verbs layer - I am not very knowledgeable with this). It used to work like a dream but Mellanox has apparently disabled some of the Connect X2 components, because neither ompi or ucx (with/without ompi) could connect with the RDMA layer. Some of the infiniband functions are also not working on the X2 (mstflint, mstconfig). In fact ompi always tries to access the openib module. I have to explicitly disable it even to run on 1 node. So I think it is in initialization not communication that the problem lies. This is why (I think) ibv_obj returns NULL. The better news is that with the tcp stack everything works fine (ompi, ucx, 1 node, many nodes) - the bandwidth is similar to rdma so for large messages its semi OK. Its a partial solution - not all I wanted of course. The direct rdma functions ib_read_lat etc also work fine with expected results. I am suspicious this disabling of the driver is a commercial more than a technical decision. I am going to try going back to Ubuntu 16.04 - there is a version of OFED that still supports the X2. But I think it may still get messed up by kernel upgrades (it does for 18.04 I found). So its not an easy path. Thanks again. Tony On 8/24/20 11:35 AM, Jeff Squyres (jsquyres) wrote: [External Email] I'm afraid I don't have many better answers for you. I can't quite tell from your machines, but are you running IMB-MPI1 Sendrecv *on a single node* with `--mca btl openib,self`? I don't remember offhand, but I didn't think that openib was supposed to do loopback communication. E.g., if both MPI processes are on the same node, `--mca btl openib,vader,self` should do the trick (where "vader" = shared memory support). More specifically: are you running into a problem running openib (and/or UCX) across multiple nodes? I can't speak to Nvidia support on various models of [older] hardware (including UCX support on that hardware). But be aware that openib is definitely going away; it is wholly being replaced by UCX. It may be that your only option is to stick with older software stacks in these hardware environments. On Aug 23, 2020, at 9:46 PM, Tony Ladd via users wrote: Hi John Thanks for the response. I have run all those diagnostics, and as best I can tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients + server) and the fabric passes all the tests. There is 1 warning: I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps but according to a number of sources this is harmless. I have run Mellanox's P2P performance tests (ib_write_bw) between different pairs of nodes and it reports 3.22 GB/sec which is reasonable (its PCIe 2 x8 interface ie 4 GB/s). I have also configured 2 nodes back to back to check that the switch is not the problem - it makes no difference. I have been playing with the btl params with openMPI (v. 2.1.1 which is what is relelased in Ubuntu 18.04). So with tcp as the transport layer everything works fine - 1 node or 2 node communication - I have tested up to 16 processes (8+8) and it seems fine. Of course the latency is much higher on the tcp interface, so I would still like to access the RDMA layer. But unless I exclude the openib module, it always hangs. Same with OpenMPI v4 compiled from source. I think an important component is that Mellanox is not supporting Connect X2 for some time. This is really infuriating; a $500 network card with no supported drivers, but that is business for you I suppose. I have 50 NICS and I can't afford to replace them all. The other component is the MLNX-OFED is tied to specific software versions, so I can't just run an older set of drivers. I have not seen source files for the Mellanox drivers - I would take a crack at compiling them if I did. In the past I have used the OFED drivers (on Centos 5) with no problem, but I don't think this is an option now. Ubuntu claims to support Connect X2 with their drivers (Mellanox confirms this), but of course this is community support and the number of cases is obviously small. I use the Ubuntu drivers right now because the OFED install seems broken and there is no help with it. Its not supported! Neat huh? The only handle I have is with openmpi v. 2 when there is a message (see my original post) that ibv_obj returns a NULL result. But I don't understand the significance of the message (if any). I am not enthused about UCX - the documentation has several obvious typos in it, which is not encouraging when you a floundering. I know its a newish project but I have used openib for 10+ years and its never had a problem until now. I think this
Re: [OMPI users] Problem in starting openmpi job - no output just hangs - SOLVED
Jeff I found the solution - rdma needs significant memory so the limits on the shell have to be increased. I needed to add the lines * soft memlock unlimited * hard memlock unlimited to the end of the file /etc/security/limits.conf. After that the openib driver loads and everything is fine - proper IB latency again. I see that # 16 of the tuning FAQ discusses the same issue, but in my case there was no error or warning message. I am posting this in case anyone else runs into this issue. The Mellanox OFED install adds those lines automatically, so I had not run into this before. Tony On 8/25/20 10:42 AM, Jeff Squyres (jsquyres) wrote: [External Email] On Aug 24, 2020, at 9:44 PM, Tony Ladd wrote: I appreciate your help (and John's as well). At this point I don't think is an OMPI problem - my mistake. I think the communication with RDMA is somehow disabled (perhaps its the verbs layer - I am not very knowledgeable with this). It used to work like a dream but Mellanox has apparently disabled some of the Connect X2 components, because neither ompi or ucx (with/without ompi) could connect with the RDMA layer. Some of the infiniband functions are also not working on the X2 (mstflint, mstconfig). If the IB stack itself is not functioning, then you're right: Open MPI won't work, either (with openib or UCX). You can try to keep poking with the low-layer diagnostic tools like ibv_devinfo and ibv_rc_pingpong. If those don't work, Open MPI won't work over IB, either. In fact ompi always tries to access the openib module. I have to explicitly disable it even to run on 1 node. Yes, that makes sense: Open MPI will aggressively try to use every possible mechanism. So I think it is in initialization not communication that the problem lies. I'm not sure that's correct. From your initial emails, it looks like openib thinks it initialized properly. This is why (I think) ibv_obj returns NULL. I'm not sure if that's a problem or not. That section of output is where Open MPI is measuring the distance from the current process to the PCI bus where the device lives. I don't remember offhand if returning NULL in that area is actually a problem or just an indication of some kind of non-error condition. Specifically: if returning NULL there was a problem, we *probably* would have aborted at that point. I have not looked at the code to verify that, though. The better news is that with the tcp stack everything works fine (ompi, ucx, 1 node, many nodes) - the bandwidth is similar to rdma so for large messages its semi OK. Its a partial solution - not all I wanted of course. The direct rdma functions ib_read_lat etc also work fine with expected results. I am suspicious this disabling of the driver is a commercial more than a technical decision. I am going to try going back to Ubuntu 16.04 - there is a version of OFED that still supports the X2. But I think it may still get messed up by kernel upgrades (it does for 18.04 I found). So its not an easy path. I can't speak for Nvidia here, sorry. -- Jeff Squyres jsquy...@cisco.com -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel: (352)-392-6509 FAX: (352)-392-9514
Re: [OMPI users] MPI Exit Code:1 on an OpenFoam application
ated. The first process to do so was: Process name: [[32896,1],0] Exit code: 1 -- I looked into the Systemmonitor, but i didnt have a process with this name or number. If i execute mpirun --version the consol replys this message: mpirun (Open MPI) 4.0.3 Report bugs to https://urldefense.proofpoint.com/v2/url?u=http-3A__www.open-2Dmpi.org_community_help_&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=kgFAU2BfgKe7cozjrP7uWDPH6xt6LAmYVlQPwQuK7ek&m=9YEwGLzNfCD1pAUuvNpqStsbpagtNfIzEt6wL6f3_7I&s=xXp3HlEJc7DzUAnJY0RVVKgKZ9HopKf0UUMePlaCV8w&e= How can I solve this problem ? Best regards Kai -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel: (352)-392-6509 FAX: (352)-392-9514 -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel: (352)-392-6509 FAX: (352)-392-9514
Re: [OMPI users] MPI Exit Code:1 on an OpenFoam application
Kai That means your case directory is mostly OK. Exactly what command did you use to run the executable. By serial mode I actually meant a single processor. For example simpleFoam But then its surprising if it uses multiple cores. But it may be using multithreading by default. On a multicore node something like simpleFoam -parallel might use a default number of cores (probably all of them). For a proper parallel job you need to decompose the problem first with decomposePar. One possible source of error is the decomposeParDict file. If you don't get a proper decomposition that can be a problem. There are examples online. My typical run script would be something like decomposePar mpirun -np 4 simpleFoam -parallel 2>&1 | tee log reconstructPar You can check the decomposition with paraview (in the individual processor dirs) Tony On 1/10/21 5:29 AM, Kahnbein Kai via users wrote: [External Email] Hey Tony, it works without the -parallel flag, all four cpu's are at 100% and running fine. Best regards Kai Am 05.01.21 um 20:36 schrieb Tony Ladd via users: Just run the executable without mpirun and the -parallel flag. On 1/2/21 11:39 PM, Kahnbein Kai via users wrote: *[External Email]* Ok, sorry, what do you mean with the "serial version" ? Best regards Kai Am 31.12.20 um 16:25 schrieb tladd via users: I did not see the whole email chain before. The problem is not that it cannot find the MPI directories. I think this INIT error comes when the program cannot start for some reason. For example a missing input file. Does the serial version work. On 12/31/20 6:33 AM, Kahnbein Kai via users wrote: *[External Email]* I compared the /etc/bashrc files of both versions of OF (v7 and v8) and i dont found any difference. Here are the lines (i thought related to openmpi) of both files: OpenFOAM v7: Line 86 till 89: #- MPI implementation: # WM_MPLIB = SYSTEMOPENMPI | OPENMPI | SYSTEMMPI | MPICH | MPICH-GM | HPMPI # | MPI | FJMPI | QSMPI | SGIMPI | INTELMPI export WM_MPLIB=SYSTEMOPENMPI Line 169 till 174: # Source user setup files for optional packages # ~ _foamSource `$WM_PROJECT_DIR/bin/foamEtcFile config.sh/mpi` _foamSource `$WM_PROJECT_DIR/bin/foamEtcFile config.sh/paraview` _foamSource `$WM_PROJECT_DIR/bin/foamEtcFile config.sh/ensight` _foamSource `$WM_PROJECT_DIR/bin/foamEtcFile config.sh/gperftools` OpenFOAM v8: Line 86 till 89: #- MPI implementation: # WM_MPLIB = SYSTEMOPENMPI | OPENMPI | SYSTEMMPI | MPICH | MPICH-GM | HPMPI # | MPI | FJMPI | QSMPI | SGIMPI | INTELMPI export WM_MPLIB=SYSTEMOPENMPI Line 169 till 174: # Source user setup files for optional packages # ~ _foamSource `$WM_PROJECT_DIR/bin/foamEtcFile config.sh/mpi` _foamSource `$WM_PROJECT_DIR/bin/foamEtcFile config.sh/paraview` _foamSource `$WM_PROJECT_DIR/bin/foamEtcFile config.sh/ensight` _foamSource `$WM_PROJECT_DIR/bin/foamEtcFile config.sh/gperftools` Are you think these are the right lines ? I wish you a healthy start into the new year, Kai Am 30.12.20 um 15:25 schrieb tladd via users: Probably because OF cannot find your mpi installation. Once you set your OF environment, where is it looking for mpicc? Note the OF environment overrides your .bashrc once you source the OF bashrc. That takes its settings from the src/etc directory in the OF source code. On 12/29/20 10:23 AM, Kahnbein Kai via users wrote: [External Email] Thank you for this hint. I installed OpenFOAM v8 (the newest) on my computer and it works ... At the Version v7 i get still this mpi error I dont know why ... I wish you a healthy start into the new year :) Am 28.12.20 um 19:16 schrieb Benson Muite via users: Have you tried reinstalling OpenFOAM? If you are mostly working in a desktop, there are pre-compiled versions available: https://urldefense.proofpoint.com/v2/url?u=https-3A__openfoam.com_download_&d=DwIDaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=kgFAU2BfgKe7cozjrP7uWDPH6xt6LAmYVlQPwQuK7ek&m=9YEwGLzNfCD1pAUuvNpqStsbpagtNfIzEt6wL6f3_7I&s=bZFAwh79J3ZL1Ut9Jt4qj-kBCubrvjsLNhq51hnAwXk&e= If you are using a pre-compiled version, do also consider reporting the error to the packager. It seems unlikely to be an MPI error, more likely something with OpenFOAM and/or the setup. On 12/28/20 6:25 PM, Kahnbein Kai via users wrote: Good morning, im trying to fix this error by myself and i have a little update. The ompi version i use is the: Code: kai@Kai-Desktop:~/Dokumente$ mpirun --version mpirun (Open MPI) 4.0.3 If i create a *.c file, with the following content: Code: #include #include int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Get the rank of the process int world_rank; M