Hi,
1.
The Mellanox has a newer fw for those HCAshttp://
www.mellanox.com/content/pages.php?pg=firmware_table_IH3Lx
I am not sure if it will help, but newer fw usually have some bug fixes.
2.
try to disable leave_pinned during the run. It's on by default in 1.3.3
Lenny.

On Thu, Aug 13, 2009 at 5:12 AM, Allen Barnett <al...@transpireinc.com>wrote:

> Hi:
> I recently tried to build my MPI application against OpenMPI 1.3.3. It
> worked fine with OMPI 1.2.9, but with OMPI 1.3.3, it hangs part way
> through. It does a fair amount of comm, but eventually it stops in a
> Send/Recv point-to-point exchange. If I turn off the openib btl, it runs
> to completion. Also, I built 1.3.3 with memchecker (which is very nice;
> thanks to everyone who worked on that!) and it runs to completion, even
> with openib active.
>
> Our cluster consists of dual dual-core opteron boxes with Mellanox
> MT25204 (InfiniHost III Lx) HCAs and a Mellanox MT47396 Infiniscale-III
> switch. We're running RHEL 4.8 which appears to include OFED 1.4. I've
> built everything using GCC 4.3.2. Here is the output from ibv_devinfo.
> "ompi_info --all" is attached.
> $ ibv_devinfo
> hca_id: mthca0
>        fw_ver:                         1.1.0
>        node_guid:                      0002:c902:0024:3284
>        sys_image_guid:                 0002:c902:0024:3287
>        vendor_id:                      0x02c9
>        vendor_part_id:                 25204
>        hw_ver:                         0xA0
>        board_id:                       MT_03B0140002
>        phys_port_cnt:                  1
>                port:   1
>                        state:                  active (4)
>                        max_mtu:                2048 (4)
>                        active_mtu:             2048 (4)
>                        sm_lid:                 1
>                        port_lid:               1
>                        port_lmc:               0x00
>
> I'd appreciate any tips for debugging this.
> Thanks,
> Allen
>
> --
> Allen Barnett
> Transpire, Inc
> E-Mail: al...@transpireinc.com
> Skype:  allenbarnett
> Ph:     518-887-2930
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to