Hi, > A) The execution time in case "1" should be smaller (only sm > communication, no?) than case "2" and "3", no? Cache problems?
Shot in the dark from working on Sun T1 (also 8 real cores): from time to time the OS wants to do something (interrupt handling, wake up cron, ...). Leaving one or two cores spare for that purpose sometimes yields much better performance (no context switches for OS anymore). > B) Why the "sys" time while using communication inter nodes? NIC > driver? That does not seem to be an uncommon value for ethernet NIC driver and TCP/IP stack (but depends on the specific hardware (e.g. on-board ethernet cards are worse than "real" ones; infiniband etc. is better than ethernet, ...) and the amount of messages which depends on the algorith). Depending on how you've taken measure/which OS/kernel/... maybe that time consists of the time a driver waits for something to happen on the network, too. > Why this time increase when I balance the load across the > nodes? The more nodes you use, the more communication between them takes place, so the more "parties" have to sync with each other, so the more overhead is generated. Best regards, Fabian