Lydia,This message means that the TCP driver try to send some illegal buffer. Look like the send failed while sending 16 bytes, which is pretty uncommon. What is this application? Do you get the same error message when running with fewer nodes?
Thanks, George. On Feb 2, 2008, at 4:37 AM, Lydia Heck wrote:
In one of our big runs (512 cpus) the code fails and produces on a listof nodes the following type of error: I have searched the FAQs but could not find an answer there.There are difficulties getting the code to run because of its shear sizebut there is no other indication of the problem.Does the following error message mean the some of the nodes have given up?mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev error ([361eca8[m2234][0,1,283][m2317, 16][0,) 1Bad address,422(3) ][[/ws/hpc-ct-7.1/builds/7.1.build-ct7.1-003c/ompi-ct7.1/ompi/mca/btl/ tcp/btl_tcp_frag.c:114:mca_btl_tcp_frag_send]/ws/hpc-ct-7.1/builds/7.1.build-ct7.1-003c/ompi-ct7.1/ompi/mca/btl/ tcp/btl_tcp_frag.c[m22 41][0,1,430][m2140[m2152][0,1,150][mca_btl_tcp_frag_send: writev error (3c759a8,16) Bad address(3) Lydia ------------------------------------------ Dr E L Heck University of Durham Institute for Computational Cosmology Ogden Centre Department of Physics South Road DURHAM, DH1 3LE United Kingdom e-mail: lydia.h...@durham.ac.uk Tel.: + 44 191 - 334 3628 Fax.: + 44 191 - 334 3645 ___________________________________________ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
smime.p7s
Description: S/MIME cryptographic signature