Hi Brock
We have a user whos code keep failing at a similar point in the
code.  The errors (below) would make me think its a fabric problem,
but ibcheckerrors is not returning any issues.  He is using
openmpi-1.2.0  With OFED on RHEL4,

  Strangely enough, I hit this exact problem about half an hour ago...
what compilers is he using for the code / OpenMPI?  I haven't narrowed
down the cause yet because the system I'm on is a tad, uh, disheveled,
but it'd be good to find any commonality.  I'm using PGI-7.1-2
(pgf77/pgf90) with OpenMPI-1.2.4. The system also happens to be RHEL 4
(Update 3).

We are also running PGI compilers version 6.2. We have Cisco (topspin) IB hardware, and using OFED 1.1 stock with red hat.

Is this the same you are using?


  .. Also, the code I'm running is CCSM, and it gave an error message
about being unable to read a file correctly right before my
synchronization.  This code has worked on other systems in the past
(non-IB, non-IBRIX), but something as basic as a file write shouldn't be
adversely affected by such things, hence I'm going to try backing the
compiler down to a 'known-good' one first., since perhaps that's my
problem.  I don't suppose you saw any messages of that sort?   I did
already try setting the retry count parameter up to 20 (from 7), but
that didn't fix it.

  Cheers,
  - Brian


Brian Dobbins
Yale University HPC

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to