himanshu khandelia wrote: > Hi Carsten, > > The benchmarks were made is 1 NIC/node, and yet the scaling is bad. > Does that mean that there is indeed network congestion ? We will try > using back to back connections soon,
Hi Himanshu, In my opinion the most probable scenario is that the bandwidth of the single gigabit connection is not sufficient for the four very fast CPUs you have on each node. I would do an 8 CPU benchmark with a back-to-back connection as here the chance for network congestion is minimized. If the benchmarks stay as they were with the switch (they might be a bit better because you do not have the switch's latency), I would try to make use of both interfaces to double the bandwidth. This can easily be done with OpenMPI. You could also do a 16 CPU benchmark on 16 nodes so that the processes do not need to share the network interface. If the scaling is better compared to 16 CPUs on 4 nodes, it is an indication for the bandwidth problem. Carsten > > -himanshu > > > > maybe your problem is not even flow control, but the limited network > bandwidth which is shared among 4 CPUs in your case. I also have done > benchmarks on Woodcrests >> (2.33 GHz) and was not able to scale an 80000 atom system beyond 1 node with >> Gbit Ethernet. Looking in more detail, the time gained by the additional 4 >> CPUs of a >> second node was exactly balanced by the extra communication. I used only 1 >> network interface for that benchmark, leaving effectively only 1/4 th of the >> bandwidth >> for each CPU. Using two interfaces with OpenMPI did not double the network >> performance on our cluster. In my tests nodes with 2 CPUs sharing one NIC >> were faster >> than nodes with 4 CPUs sharing two NICs. Could be on-node contention, since >> both interfaces probably end up on the same bus internally. >> >> Are the benchmarks made with 1 or 2 NICs/node? If they are for 1 NIC/node >> then there should be no network congestion for the case of 8 CPUs (=2 >> nodes). You could >> try a back-to-back connection between two nodes to be absolutely shure that >> the rest of the network (switch etc.) does not play a role. I would try that >> and repeat >> the benchmark for 8 CPUs. See if you get a different value. >> ############## > _______________________________________________ > gmx-users mailing list gmx-users@gromacs.org > http://www.gromacs.org/mailman/listinfo/gmx-users > Please search the archive at http://www.gromacs.org/search before posting! > Please don't post (un)subscribe requests to the list. Use the > www interface or send it to [EMAIL PROTECTED] > Can't post? Read http://www.gromacs.org/mailing_lists/users.php _______________________________________________ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php