Yevgeny, The ibstat results: CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.600 Hardware version: a0 Node GUID: 0x0005ad00000c21e0 System image GUID: 0x0005ad000100d050 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 4 LMC: 0 SM lid: 2 Capability mask: 0x02510a68 Port GUID: 0x0005ad00000c21e1 Link layer: IB Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0005ad00000c21e2 Link layer: IB
And more interestingly, ib_write_bw: RDMA_Write BW Test Number of qps : 1 Connection type : RC TX depth : 300 CQ Moderation : 50 Link type : IB Mtu : 2048 Inline data is used up to 0 bytes message local address: LID 0x04 QPN 0x1c0407 PSN 0x48ad9e RKey 0xd86a0051 VAddr 0x002ae362870000 remote address: LID 0x03 QPN 0x2e0407 PSN 0xf57209 RKey 0x8d98003b VAddr 0x002b533d366000 ------------------------------------------------------------------ #bytes #iterations BW peak[MB/sec] BW average[MB/sec] Conflicting CPU frequency values detected: 1600.000000 != 3301.000000 65536 5000 0.00 0.00 ------------------------------------------------------------------ What does Conflicting CPU frequency values mean? Examining the /proc/cpuinfo file however shows: processor : 0 cpu MHz : 3301.000 processor : 1 cpu MHz : 3301.000 processor : 2 cpu MHz : 1600.000 processor : 3 cpu MHz : 1600.000 Which seems oddly wierd to me... ________________________________ From: Yevgeny Kliteynik <klit...@dev.mellanox.co.il> To: Randolph Pullen <randolph_pul...@yahoo.com.au>; OpenMPI Users <us...@open-mpi.org> Sent: Thursday, 6 September 2012 6:03 PM Subject: Re: [OMPI users] Infiniband performance Problem and stalling On 9/3/2012 4:14 AM, Randolph Pullen wrote: > No RoCE, Just native IB with TCP over the top. Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card". Could you run "ibstat" and post the results? What is the expected BW on your cards? Could you run "ib_write_bw" between two machines? Also, please see below. > No I haven't used 1.6 I was trying to stick with the standards on the > mellanox disk. > Is there a known problem withrom:* Yevgeny Kliteynik <klit...@dev.mellanox.co.il> > *To:* Randolph Pullen <randolph_pul...@yahoo.com.au>; Open MPI Users > <us...@open-mpi.org> > *Sent:* Sunday, 2 September 2012 10:54 PM > *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling > > Randolph, > > Some clarification on the setup: > > "Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to > Ethernet? > That is, when you're using openib BTL, you mean RoCE, right? > > Also, have you had a chance to try some newer OMPI release? > Any 1.6.x would do. > > > -- YK > > On 8/31/2012 10:53 AM, Randolph Pullen wrote: > > (reposted with consolidatedinformation) > > I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA 10G >cards > > running Centos 5.7 Kernel 2.6.18-274 > > Open MPI 1.4.3 > > MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2): > > On a Cisco 24 pt switch > > Normal performance is: > > $ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts PingPong > > results in: > > Max rate = 958.388867 MB/sec Min latency = 4.529953 usec > > and: > > $ mpirun --mca btl tcp,self -n 2 -hostfile mpi.hosts PingPong > > Max rate = 653.547293 MB/sec Min latency = 19.550323 usec > > NetPipeMPI results show a max of 7.4 Gb/s at 8388605 bytes which seems >fine. > > log_num_mtt =20 and log_mtts_per_seg params =2 > > My application exchanges about a gig of data between the processes with 2 >sender and 2 consumer processes on each node with 1 additional controller >process on the starting node. > > The program splits the data into 64K blocks and uses non blocking sends >and receives with busy/sleep loops to monitor progress until completion. > > Each process owns a single buffer for these 64K blocks. > > My problem is I see better performance under IPoIB then I do on native IB >(RDMA_CM). > > My understanding is that IPoIB is limited to about 1G/s so I am at a loss >to know why it is faster. > > These 2 configurations are equivelant (about 8-10 seconds per cycle) > > mpirun --mca btl_openib_flags 2 --mca mpi_leave_pinned 1 --mca btl >tcp,self -H vh2,vh1 -np 9 --bycore prog > > mpirun --mca btl_openib_flags 3 --mca mpi_leave_pinned 1 --mca btl >tcp,self -H vh2,vh1 -np 9 --bycore prog When you say "--mca btl tcp,self", it means that openib btl is not enabled. Hence "--mca btl_openib_flags" is irrelevant. > > And this one produces similar run times but seems to degrade with repeated >cycles: > > mpirun --mca btl_openib_eager_limit 64 --mca mpi_leave_pinned 1 --mca btl >openib,self -H vh2,vh1 -np 9 --bycore prog You're running 9 ranks on two machines, but you're using IB for intra-node communication. Is it intentional? If not, you can add "sm" btl and have performance improved. -- YK > > Other btl_openib_flags settings result in much lower performance. > > Changing the first of the above configs to use openIB results in a 21 >second run time at best. Sometimes it takes up to 5 minutes. > > In all cases, OpenIB runs in twice the time it takes TCP,except if I push >the small message max to 64K and force short messages. Then the openib times >are the same as TCP and no faster. > > With openib: > > - Repeated cycles during a single run seem to slow down with each cycle > > (usually by about 10 seconds). > > - On occasions it seems to stall indefinitely, waiting on a single receive. > > I'm still at a loss as to why. I can’t find any errors logged during the >runs. > > Any ideas appreciated. > > Thanks in advance, > > Randolph > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <mailto:us...@open-mpi.org> > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > >