Hi,
I just noticed that my previous mail bounced,
but it doesn't matter. Please ignore it if
you got it anyway - I re-read the thread and
there is a much simpler way to do it.
If you want to check whether LID L is reachable
through HCA H from port P, you can run this command:
smpquery --Ca H --Port P NodeInfo L
Example:
smpquery --Ca mlx4_0 --Port 2 NodeInfo 4
If you don't get response or you get info of
the device different that what you would expect,
then the two ports are not part of the same
subnet, and APN is expected to fail.
Otherwise - it's probably a bug.
-- YK
On 08-Mar-12 5:44 PM, Shamis, Pavel wrote:
Jeremy,
Finally I had a chance to look at log file.
Initially all qps are created on port 1, and in the same time alternative path
loaded (ports 2, lids 4 and 2 ). I guess in some point you switch off port 1,
APM even is reported because the alternative path is active now, and from some
reason IB message is dropped.
You may ignore the APM warning. Essentially since the alternative path is
active now, it is trying to see if OMPI may pre-load next good path for
potential future failure on port 2. Since port 3 does not exist it reports the
warning.
My educated guess is that from some reason it is no direct connection path
between lid-2 and lid-4. To prove it we have to look and the OpenSM routing
information.
On the mail list we have a representative from Mellanox that should be able to
help us extract the routing information.
Evgeny,
Can you please help ?
Regards,
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Feb 29, 2012, at 5:38 PM, Jeremy wrote:
Hi Pasha,
On Wed, Feb 29, 2012 at 11:02 AM, Shamis, Pavel<sham...@ornl.gov> wrote:
I would like to see all the file.
28MB is it the size after compression ?
I think gmail supports up to 25Mb.
You may try to create gzip file and then slice it using "split" command.
See attached. At about line 151311 is when I unplugged the cable from
Port 1. Then I see the APM error message at about line 178905.
Thanks,
-Jeremy
<debug.txt.bz2>_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users