On Sun, Aug 22, 2010 at 9:57 PM, Randolph Pullen <
randolph_pul...@yahoo.com.au> wrote:

> Its a long shot but could it be related to the total data volume ?
> ie  524288 * 80 = 41943040 bytes active in the cluster
>
> Can you exceed this 41943040 data volume with a smaller message repeated
> more often or a larger one less often?
>

Not so far, so your diagnosis could be right. The failures have been at the
following data volumes:

41.9E6
4.1E6
8.2E6

Unfortunately, I'm not sure I can change the repeat rate with the OFED/MPI
tests. Can I do that? Didn't see a suitable flag.

In any case, assuming it is related to the total data volume what could be
causing such a failure?

-- 
Rahul

Reply via email to