Re: [OMPI users] myirnet problems on OSX

Scott Atchley Wed, 29 Nov 2006 08:44:18 -0500

On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:

I had sent a message two weeks ago about this problem and talked with
jeff at SC06 about how it might not be a OMPI problem.  But it
appears now working with myricom that it is a problem in both
lam-7.1.2 and openmpi-1.1.2/1.1.1.   Basically the results from a HPL
run are wrong,  Also causes a large number of packets to be dropped
by the fabric.


This problem does not happen when using mpichgm.  The number of
dropped packets does not go up.  There is a ticket open with myircom
on this.  They are a member of the group working on OMPI but i sent
this out just to bring the list uptodate.

If you have any questions feel free to ask me.  The details are in
the archive.

Brock Palen


Hi all,

I am working on this ticket at Myricom.

I am using Linux nodes since we do not have two OSX machines running10.3 available. Each has 1 GB of RAM and two Myrinet PCI-X cards, asingle-port D card and a dual-port E card. I have disabled the Ecard. I am using GM-2.0.26. I am also using Open-MPI 1.2b1.

I am running HPCC which includes HPL as well as other benchmarks.Using Brock's HPL.dat values in my hpccinf.txt, I do not see anyfailed HPL runs. I do see some runs hang and require a reboot (themachine is unresponsive), but it may happen in the HPL portion of therun or it may happen in another benchmark.

My last few runs all completed successfully without hanging. The jobI am currently running just hung one node (can respond to ping,cannot ssh into it, cannot use any terminals connected to it).

There are no messages in dmesg and vmstat shows that the node is notswapping (before it hung).


Any ideas where I should look next?

Scott

Re: [OMPI users] myirnet problems on OSX

Reply via email to