> > > So when you say you want your master to send "as fast as possible", I > suppose you meant get back to running your code as soon as possible. In > that case you would want nonblocking. However when you say you want the > slaves to receive data faster, it seems you're implying the actual data > transmission across the network. I believe the data transmission speed is > not dependent on whether the it is blocking or nonblocking. > Sorry I did not express myself clearly. With 'as fast as possible' I meant that I want to have all data ASAP available in my slave nodes. The master has nothing to do but sending so I do not care if the sends are blocking or non-blocking. Actually the master will use seperate threads for the sending anyway so either I launch a thread per blocking-send or just 1 thread to do all the sending using nonblocking sends.
I do think there is plenty of reason for a difference (in the timing for receiving the data in the slaves). If OpenMPI is not able to offload the sending to some dedicated card (which in my case is probably the case since I'm on a stock linux with stock ethernet cards) and OpenMPI will try to send the data that it was requested to send by multiple nonblocking send's simultaneously, OpenMPI itself probably needs to multi-thread the sending of each message himself. Well, I do not know anything about the internals of OpenMPI so I actually have no clue how OpenMPI would do this really and how it will try to optimise the use of BW on the network. toon