There are no messages being spit out, but i'm not sure i have all the correct debugs turn on. I turned on -debug-devel -debug-daemons and mca_verbose. but it appears that the process just hangs.
If it's memory exhaustion its not from the core memory these nodes have 48GB of memory, it could be a buffer somewhere, but i'm not sure where On Mon, Apr 4, 2011 at 10:17 PM, David Zhang <solarbik...@gmail.com> wrote: > Any error messages? Maybe the nodes ran out of memory? I know MPI > implement some kind of buffering under the hood, so even though you're > sending array's over 2^26 in size, it may require more than that for MPI to > actually send it. > > On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: >> >> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending >> messages over 2^26 in size? >> >> For a reason i have not determined just yet machines on my cluster >> (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send >> array's over 2^26 in size via the AllToAll collective. (user code) >> >> Further testing seems to indicate that an MPI message over 2^26 fails >> (tested with IMB-MPI) >> >> Running the same test on a different older IB connected cluster seems >> to work, which would seem to indicate a problem with the infiniband >> drivers of some sort rather then openmpi (but i'm not sure). >> >> Any thoughts, directions, or tests? >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > David Zhang > University of California, San Diego > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >