Hello, I tried to run a parallel computation (OpenFOAM) using Open MPI on a HPC connected with infiniband. When I ran a job using mpirun over a couple of nodes (20 cpus per node), the computation was not accelerated as I expected.
For example, I ran the job over 40 cpus on 2 nodes, and I checked cpu usages and processes via top command. I expected 20 processes would be running on each node but I found that only 19 processes were running and a cpu was in idle while others were used. Following is a capture of top result. As you can see, Cpu1 is in idle and there are only 19 simpleFoam processes! [cid:image002.png@01D126AB.FA0798C0] I have no idea why this is happened. Sometimes, a cpu is in idle while 20 processes are running but, in that case two processes running with 50% of cpu usage. That is, those two different processes are assigned to the same cpu. Please refer to the attached file for required information of the cluster and its environment. The output of “ulimit -l“ command on both nodes is “unlimited”. Additional information for OpenFabrics-based network is as following: 1. OpenFabrics version : MLNX_OFED_LINUX-2.4.1.0.0 2. Linux/kernel info: RHEL6.5 2.6.32-431.el6.x86_64 - Linux distro/version : Red hat Enterprise Linux Server release 6.5 (Santiago) - Kernel version : 2.6.32-431.el6.x86_64 3. Subnet manager : infiniband B class Thank you. Best regards, Geon-Hong Kim. ----------------------------------- [cid:image001.png@01D126AA.AC080E80] Geon-Hong Kim Engineer, Ph.D. Performance Evaluation Research Department Hyundai Maritime Research Institute Hyundai Heavy Industries Co., Ltd. Office +82-52-203-8053 Fax +82-52-250-9675 Mobile +82-10-3084-1357 -----------------------------------
system_info.tar.bz2
Description: system_info.tar.bz2