Hi all, I have a question related to Open MPI btl_tcp module. How to slow down OMPI in terms of network utilization ? My OpenMPI based application (lets call it APP1) is sharing hardware components with other software (lets call it APP2) which is very sensitive on network saturation. The APP2 suffers from lost “KEEP_ALIVE_RESPONSE” like tcp packages. Things are more complicated because of below described architecture constraints or other reasons:
- We are using Open MPI version 1.4.3 and we rather do not want to migrate to newer version right now - computational nodes have a some company internal Linux distribution (2.6.x core version) - I've done some external patches on OMPI 1.4.3 source codes in order to address problem with lack of support for the multiple NIC aliases (in conjunction with bonds) by OMPI. 1. Ifconfig says us that on every node we have 3 NIC aliases (3 different IP and names with pattern such as: xxx0, xxx0:0, xxx0:1) in conjunction with bond configuration (all have the same HWaddr) - My patch wasn't so sophisticated, the main idea was to just store a local NIC index instead of kernel NIC index. Then I modified the mca_btl_tcp_proc_accept() method to iterate over all published by peer endpoint NIC IP's (I made some changes to the * mca_btl_tcp_component_exchange* method ...) . - I'm not allowed to add some new Linux core modules which could be used to limit network bandwidth on OS level :( Any tips are very appreciated Best Regards, Sylvester