Very interesting, indeed! Message passing running over raw Ethernet using cheap COTS PCs is indeed the need of the hours for people like me who has a very shallow pocket. Great work! What would make this effort *really* cool is to have a one-to-one mapping of APIs from MPI domain to GAMMA domain, so, for example, existing MPI code can be ported with a trivial amount of work. Professor Ladd, how did you do this porting, e.g. for VASP? How much of an effort was it? (Or did the VASP guys already had a version running over GAMMA ?)
Thanks Durga On 10/24/06, Tony Ladd <l...@che.ufl.edu> wrote:
Lisandro I use my own network testing program; I wrote it some time ago because Netpipe only tested 1-way rates at that point. I havent tried IMB but I looked at the source and its very similar to what I do. 1) set up buffers with data. 2) Start clock 3) Call MPI_xxx N times 4) Stop clock 5) calculate rate. IMB tests more things than I do; I just focused on the calls I use (send recv allreduce). I have done a lot of testing of hardware and software. I will have some web pages posted soon. I will put a note here when I do. But a couple of things. A) I have found the switch is the biggest discriminant if you want to run HPC under Gigabit ethernet. Most GigE switches choke when all the ports are being used at once. This is the usual HPC pattern, but not of a typical network, which is what these switches are geared towards. The one exception I have found is the Extreme Networks x450a-48t. In some test patterns I found it to be 500 times faster (not a typo) than the s400-48t, which is its predecessor. I have tested several GigE switches (Extreme, Force10, HP, Asante) and the x450 is the only one that copes with high traffic loads in all port configurations. Its expensive for a GigE switch (~$6500) but worth it in my opinion if you want to do HPC. Its still much cheaper than Infiniband. B) You have to test the switch in different port configurations-a random ring of SendRecv is good for this. I don't think IMB has it in its test suite but its easy to program. Or you can change the order of nodes in the machinefile to force unfavorable port assignments. A step of 12 is a good test since many GigE switches use 12-port ASICS and this forces all the traffic onto the backplane. On the Summit 400 this causes it to more or less stop working-rates drop to a few Kbytes/sec along each wire, but the x450 has no problem with the same test. You need to know how your nodes are wired to the switch to do this test. C) GAMMA is an extraordinary accomplishment in my view; in a number of tests with codes like DLPOLY, GROMACS, VASP it can be 2-3 times the speed of TCP based programs with 64 cpus. In many instances I get comparable (and occasionally better) scaling than with the university HPC system which has an Infiniband interconnect. Note I am not saying GigE is comparable to IB; but that a typical HPC setup with nodes scattered all over a fat tree topology (including oversubscription of the links and switches) is enough of a minus that an optimized GigE set up can compete; at least up to 48 nodes (96 cpus in our case). I have worked with Giuseppe Ciaccio for the past 9 months eradicating some obscure bugs in GAMMA. I find them; he fixes them. We have GAMMA running on 48 nodes quite reliably but there are still many issues to address. GAMMA is very much a research tool-there are a number of features(?) which would hinder it being used in an HPC environment. Basically Giuseppe needs help with development. Any volunteers? Tony ------------------------------- Tony Ladd Professor, Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
-- Devil wanted omnipresence; He therefore created communists.