I am getting the following error with openmpi-1.1b1 mca_btl_tcp_frag_send: writev failed with errno=110
1) This does not ever happen with other MPI's I have tried like MPICH and LAM 2) It only seems to happen with large numbers of cpus, 32 and occasionally 16, and with larger messages sizes. In this case it ws 128K. 3) It only seems to happen with dual cpus on each node. 4) My configuration is default with (in openmpi-mca-params.conf): pls_rsh_agent = rsh btl = tcp,self btl_tcp_if_include = eth1 I also set --mca btl_tcp_eager_limit 131072 when running the program, though leaving this out does not eliminate the problem. My program is a communication test; it sends bidirectional point to point messages among N cpus. In one test it exchanges messages between pairs of cpus, in another it reads from the node on its left and sends to the node on its right (a kind of ring), and in a third it uses MPI_ALL_REDUCE. Finally: the tcp driver in openmpi seems not nearly as good as the one in LAM. I got higher throughput with far fewer dropouts with LAM. Tony ------------------------------- Tony Ladd Professor, Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu