Re: [OMPI users] Infiniband performance Problem and stalling

Randolph Pullen Fri, 7 Sep 2012 00:43:29 -0400

Yevgeny,
The ibstat results:
CA 'mthca0'
        CA type: MT25208 (MT23108 compat mode)
        Number of ports: 2
        Firmware version: 4.7.600
        Hardware version: a0
        Node GUID: 0x0005ad00000c21e0
        System image GUID: 0x0005ad000100d050
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 4
                LMC: 0
                SM lid: 2
                Capability mask: 0x02510a68
                Port GUID: 0x0005ad00000c21e1
                Link layer: IB
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510a68
                Port GUID: 0x0005ad00000c21e2
                Link layer: IB


And more interestingly, ib_write_bw: 
                   RDMA_Write BW Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 300
 CQ Moderation   : 50
 Link type       : IB
 Mtu             : 2048
 Inline data is used up to 0 bytes message
 local address: LID 0x04 QPN 0x1c0407 PSN 0x48ad9e RKey 0xd86a0051 VAddr 
0x002ae362870000
 remote address: LID 0x03 QPN 0x2e0407 PSN 0xf57209 RKey 0x8d98003b VAddr 
0x002b533d366000
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
Conflicting CPU frequency values detected: 1600.000000 != 3301.000000
 65536     5000           0.00               0.00   
------------------------------------------------------------------

What does Conflicting CPU frequency values mean?

Examining the /proc/cpuinfo file however shows:
processor       : 0
cpu MHz         : 3301.000
processor       : 1
cpu MHz         : 3301.000
processor       : 2

cpu MHz         : 1600.000
processor       : 3
cpu MHz         : 1600.000


Which seems oddly wierd to me...



________________________________
 From: Yevgeny Kliteynik <klit...@dev.mellanox.co.il>
To: Randolph Pullen <randolph_pul...@yahoo.com.au>; OpenMPI Users 
<us...@open-mpi.org> 
Sent: Thursday, 6 September 2012 6:03 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
 
On 9/3/2012 4:14 AM, Randolph Pullen wrote:
> No RoCE, Just native IB with TCP over the top.

Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card".
Could you run "ibstat" and post the results?

What is the expected BW on your cards?
Could you run "ib_write_bw" between two machines?

Also, please see below.

> No I haven't used 1.6 I was trying to stick with the standards on the 
> mellanox disk.
> Is there a known problem with 1.4.3 ?
> 
>
 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------
> *From:* Yevgeny Kliteynik <klit...@dev.mellanox.co.il>
> *To:* Randolph Pullen <randolph_pul...@yahoo.com.au>; Open MPI Users 
> <us...@open-mpi.org>
> *Sent:* Sunday, 2 September 2012 10:54 PM
> *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling
> 
> Randolph,
> 
> Some clarification on the setup:
> 
> "Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to 
> Ethernet?
> That is, when you're using openib BTL, you mean RoCE, right?
> 
> Also, have you had a chance to try some newer OMPI release?
> Any 1.6.x would do.
> 
> 
> -- YK
> 
> On 8/31/2012 10:53 AM, Randolph Pullen wrote:
>  > (reposted with consolidatedinformation)
>  > I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA 10G 
>cards
>  > running Centos 5.7 Kernel 2.6.18-274
>  > Open MPI 1.4.3
>  > MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2):
>  > On a Cisco 24 pt switch
>  > Normal performance is:
>  > $ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts PingPong
>  > results in:
>  > Max rate = 958.388867 MB/sec Min latency = 4.529953 usec
>  > and:
>  > $ mpirun --mca btl tcp,self -n 2 -hostfile mpi.hosts PingPong
>  > Max rate = 653.547293 MB/sec Min latency = 19.550323 usec
>  > NetPipeMPI results show a max of 7.4 Gb/s at 8388605 bytes which seems 
>fine.
>  > log_num_mtt =20 and log_mtts_per_seg params =2
>  > My application exchanges about a gig of data between the processes with 2 
>sender and 2 consumer processes on each node with 1 additional controller 
>process on the starting node.
>  > The program splits the data into 64K blocks and uses non blocking sends 
>and receives with busy/sleep loops to monitor progress until completion.
>  > Each process owns a single buffer for these 64K blocks.
>  > My problem is I see better performance under IPoIB then I do on native IB 
>(RDMA_CM).
>  > My understanding is that IPoIB is limited to about 1G/s so I am at a loss 
>to know why it is faster.
>  > These 2 configurations are equivelant (about 8-10 seconds per cycle)
>  > mpirun --mca btl_openib_flags 2 --mca mpi_leave_pinned 1 --mca btl 
>tcp,self -H vh2,vh1 -np 9 --bycore prog
>  > mpirun --mca btl_openib_flags 3 --mca mpi_leave_pinned 1 --mca btl 
>tcp,self -H vh2,vh1 -np 9 --bycore prog

When you say "--mca btl tcp,self", it means that openib btl is not enabled.
Hence "--mca btl_openib_flags" is irrelevant.

>  > And this one produces similar run times but seems to degrade with repeated 
>cycles:
>  > mpirun --mca btl_openib_eager_limit 64 --mca mpi_leave_pinned 1 --mca btl 
>openib,self -H vh2,vh1 -np 9 --bycore prog

You're running 9 ranks on two machines, but you're using IB for intra-node 
communication.
Is it intentional? If not, you can add "sm" btl and have performance improved.

-- YK

>  > Other btl_openib_flags settings result in much lower performance.
>  > Changing the first of the above configs to use openIB results in a 21 
>second run time at best. Sometimes it takes up to 5 minutes.
>  > In all cases, OpenIB runs in twice the time it takes TCP,except if I push 
>the small message max to 64K and force short messages. Then the openib times 
>are the same as TCP and no faster.
>  > With openib:
>  > - Repeated cycles during a single run seem to slow down with each cycle
>  > (usually by about 10 seconds).
>  > - On occasions it seems to stall indefinitely, waiting on a single receive.
>  > I'm still at a loss as to why. I can’t find any errors logged during the 
>runs.
>  > Any ideas appreciated.
>  > Thanks in advance,
>  > Randolph
>  >
>  >
>  > _______________________________________________
>  > users mailing list
>  > us...@open-mpi.org <mailto:us...@open-mpi.org>
>  > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
>

Re: [OMPI users] Infiniband performance Problem and stalling

Reply via email to