On Sun, Aug 22, 2010 at 9:57 PM, Randolph Pullen < randolph_pul...@yahoo.com.au> wrote:
> Its a long shot but could it be related to the total data volume ? > ie 524288 * 80 = 41943040 bytes active in the cluster > > Can you exceed this 41943040 data volume with a smaller message repeated > more often or a larger one less often? > Not so far, so your diagnosis could be right. The failures have been at the following data volumes: 41.9E6 4.1E6 8.2E6 Unfortunately, I'm not sure I can change the repeat rate with the OFED/MPI tests. Can I do that? Didn't see a suitable flag. In any case, assuming it is related to the total data volume what could be causing such a failure? -- Rahul