Hi all, I have a question related to Open MPI btl_tcp module. How to slow
down OMPI in terms of network utilization ? My OpenMPI based application
(lets call it APP1) is sharing hardware components with other software
(lets call it APP2) which is very sensitive on network saturation. The APP2
suffers from lost “KEEP_ALIVE_RESPONSE” like tcp packages. Things are more
complicated because of below described architecture constraints or other
reasons:

   - We are using Open MPI version 1.4.3 and we rather do not want to
   migrate to newer version right now
   - computational nodes have a some company internal Linux distribution
   (2.6.x core version)
   - I've done some external patches on OMPI 1.4.3 source codes in order to
   address problem with lack of support for the multiple NIC aliases (in
   conjunction with bonds) by OMPI.


   1. Ifconfig says us that on every node we have 3 NIC aliases (3
      different IP and names with pattern such as: xxx0, xxx0:0, xxx0:1) in
      conjunction with bond configuration (all have the same HWaddr)


   - My patch wasn't so sophisticated, the main idea was to just store a
   local NIC index instead of kernel NIC index. Then I modified the
   mca_btl_tcp_proc_accept() method to iterate over all published by peer
   endpoint NIC IP's (I made some changes to the *
   mca_btl_tcp_component_exchange* method ...) .
   - I'm not allowed to add some new Linux core modules which could be used
   to limit network bandwidth on OS level :(


Any tips are very appreciated


 Best Regards, Sylvester

Reply via email to