Re: [OMPI users] Open MPI 1.5.4 on windows g95 / mpif90 support

2011-12-15 Thread Shiqing Fan

Hi,

The Fortran 90 bindings on Windows is still not available yet.

Regards,
Shiqing


On 2011-12-14 12:56 PM, Joao Amaral wrote:

Hi all,

I am trying to have a working mpif90 on my laptop PC (windows 7 64 
bits), so that I can develop/test fortran 90 MPI code before running 
it on a cluster.


I have tried the 1.5.4 installer on windows, cygwin, installed ubuntu, 
tried cygwin again, and now am back to the Open MPI 1.5.4 windows build.


Is it possible to use my existing g95 installation on windows so that 
I can compile fortran 90 mpi code?


These are the top lines from the output of the "ompi_info" command.

 Package: Open MPI hpcfan@VISCLUSTER26 Distribution
Open MPI: 1.5.4
   Open MPI SVN revision: r25060
   Open MPI release date: Aug 18, 2011
Open RTE: 1.5.4
   Open RTE SVN revision: r25060
   Open RTE release date: Aug 18, 2011
OPAL: 1.5.4
   OPAL SVN revision: r25060
   OPAL release date: Aug 18, 2011
Ident string: 1.5.4
  Prefix: C:\Program Files (x86)\OpenMPI_v1.5.4-x64
 Configured architecture: x86 Windows-6.1
  Configure host: VISCLUSTER26
   Configured by: hpcfan
   Configured on: 10:44 AM 08/19/2011
  Configure host: VISCLUSTER26
Built by: hpcfan
Built on: 10:44 AM 08/19/2011
  Built host: VISCLUSTER26
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (caps)
  Fortran90 bindings: no
 Fortran90 bindings size: na
  C compiler: cl
 C compiler absolute: D:/MSDev10/VC/bin/amd64/cl.exe
  C compiler family name: MICROSOFT
  C compiler version: 1600
C++ compiler: cl
   C++ compiler absolute: D:/MSDev10/VC/bin/amd64/cl.exe
  Fortran77 compiler: ifort
  Fortran77 compiler abs: C:/Program Files
  (x86)/Intel/ComposerXE-2011/bin/amd64/ifort.exe
  Fortran90 compiler: none
  Fortran90 compiler abs: none
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: no
  C++ exceptions: no
  Thread support: no
   Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: no
 MPI parameter check: never

(...)

Thanks for your help.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
---
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234  Nobelstrasse 19
Fax: ++49(0)711-685-65832  70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email: f...@hlrs.de



Re: [OMPI users] MPI_BCAST and fortran subarrays

2011-12-15 Thread Patrick Begou

Thanks all for your converging point of view about my problem.
Portability is also an important point for this code so there is  only one 
solution: using user defined data type.
In my mind, this was more for C or C++ code without the fortran subarray 
behavior but I was in error.


The problem is a little bit more complicated as the real code is a 3D 
application but I think it's not so complicated to implement this strategy.


Now I'm convinced that  user defined data type is also an important MPI feature 
in Fortran.


Patrick

--
===
|  Equipe M.O.S.T. | http://most.hmg.inpg.fr  |
|  Patrick BEGOU   |      |
|  LEGI| mailto:patrick.be...@hmg.inpg.fr |
|  BP 53 X | Tel 04 76 82 51 35   |
|  38041 GRENOBLE CEDEX| Fax 04 76 82 52 71   |
===



Re: [OMPI users] Open MPI 1.5.4 on windows g95 / mpif90 support

2011-12-15 Thread Joao Amaral

OK, thanks for the reply!

Joao

On 15-12-2011 07:40, Shiqing Fan wrote:

Hi,

The Fortran 90 bindings on Windows is still not available yet.

Regards,
Shiqing


On 2011-12-14 12:56 PM, Joao Amaral wrote:

Hi all,

I am trying to have a working mpif90 on my laptop PC (windows 7 64 
bits), so that I can develop/test fortran 90 MPI code before running 
it on a cluster.


I have tried the 1.5.4 installer on windows, cygwin, installed 
ubuntu, tried cygwin again, and now am back to the Open MPI 1.5.4 
windows build.


Is it possible to use my existing g95 installation on windows so that 
I can compile fortran 90 mpi code?


These are the top lines from the output of the "ompi_info" command.

 Package: Open MPI hpcfan@VISCLUSTER26 Distribution
Open MPI: 1.5.4
   Open MPI SVN revision: r25060
   Open MPI release date: Aug 18, 2011
Open RTE: 1.5.4
   Open RTE SVN revision: r25060
   Open RTE release date: Aug 18, 2011
OPAL: 1.5.4
   OPAL SVN revision: r25060
   OPAL release date: Aug 18, 2011
Ident string: 1.5.4
  Prefix: C:\Program Files (x86)\OpenMPI_v1.5.4-x64
 Configured architecture: x86 Windows-6.1
  Configure host: VISCLUSTER26
   Configured by: hpcfan
   Configured on: 10:44 AM 08/19/2011
  Configure host: VISCLUSTER26
Built by: hpcfan
Built on: 10:44 AM 08/19/2011
  Built host: VISCLUSTER26
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (caps)
  Fortran90 bindings: no
 Fortran90 bindings size: na
  C compiler: cl
 C compiler absolute: D:/MSDev10/VC/bin/amd64/cl.exe
  C compiler family name: MICROSOFT
  C compiler version: 1600
C++ compiler: cl
   C++ compiler absolute: D:/MSDev10/VC/bin/amd64/cl.exe
  Fortran77 compiler: ifort
  Fortran77 compiler abs: C:/Program Files
  
(x86)/Intel/ComposerXE-2011/bin/amd64/ifort.exe

  Fortran90 compiler: none
  Fortran90 compiler abs: none
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: no
  C++ exceptions: no
  Thread support: no
   Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: no
 MPI parameter check: never

(...)

Thanks for your help.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] Error launching w/ 1.5.3 on IB mthca nodes

2011-12-15 Thread TERRY DONTJE
IIRC, RNR's are usually due to the receiving side not having a segment 
registered and ready to receive data on a QP.  The btl does go through a 
big dance and does its own flow control to make sure this doesn't happen.


So when this happens are both the sending and receiving nodes using 
mthca's to communicate with?


By any chance is it a particular node (or pair of nodes) this seems to 
happen with?


--td


Open MPI InfiniBand gurus and/or Mellanox: could I please get some
assistance with this? Any suggestions on tunables or debugging
parameters to try?

Thank you very much.

On Mon, Dec 12, 2011, at 10:42 AM, V. Ram wrote:
> Hello,
>
> We are running a cluster that has a good number of older nodes with
> Mellanox IB HCAs that have the "mthca" device name ("ib_mthca" kernel
> module).
>
> These adapters are all at firmware level 4.8.917 .
>
> The Open MPI in use is 1.5.3 , kernel 2.6.39 , x86-64. Jobs are
> launched/managed using Slurm 2.2.7. The IB software and drivers
> correspond to OFED 1.5.3.2 , and I've verified that the kernel modules
> in use are all from this OFED version.
>
> On nodes with the mthca hardware *only*, we get frequent, but
> intermittent job startup failures, with messages like:
>
> /
>
> [[19373,1],54][btl_openib_component.c:3320:handle_wc] from compute-c3-07
> to: compute-c3-01 error polling LP CQ with status RECEIVER NOT READY
> RETRY EXCEEDED ERROR status
> number 13 for wr_id 2a25c200 opcode 128 vendor error 135 qp_idx 0
>
> --
> The OpenFabrics "receiver not ready" retry count on a per-peer
> connection between two MPI processes has been exceeded. In general,
> this should not happen because Open MPI uses flow control on per-peer
> connections to ensure that receivers are always ready when data is
> sent.
>
> [further standard error text snipped...]
>
> Below is some information about the host that raised the error and the
> peer to which it was connected:
>
> Local host: compute-c3-07
> Local device: mthca0
> Peer host: compute-c3-01
>
> You may need to consult with your system administrator to get this
> problem fixed.
> --
>
> /
>
> During these job runs, I have monitored the InfiniBand performance
> counters on the endpoints and switch. No telltale counters for any of
> these ports change during these failed job initiations.
>
> ibdiagnet works fine and properly enumerates the fabric and related
> performance counters, both from the affected nodes, as well as other
> nodes attached to the IB switch. The IB connectivity itself seems fine
> from these nodes.
>
> Other nodes with different HCAs use the same InfiniBand fabric
> continuously without any issue, so I don't think it's the fabric/switch.
>
> I'm at a loss for what to do next to try and find the root cause of the
> issue. I suspect something perhaps having to do with the mthca
> support/drivers, but how can I track this down further?
>
> Thank you,
>
> V. Ram. 




[OMPI users] "almost there" in getting MPI to run

2011-12-15 Thread Joao Amaral

Hi all,

After trying cygwin and the windows build of Open MPI, I've now focused 
on using linux for my mpif90 code testing/development on my laptop.


I've managed to install Open MPI, and it works, sort of.

Strangely(?), in both my laptop and the cluster, the number of threads 
from the command


call MPI_Comm_size ( MPI_COMM_WORLD, p, error )

Results on only one active thread, when my laptop is a quad-core (should 
be 8 threads). The same happens running in the cluster, where each 
"blade" has 8 cores.


What am I missing here? Is there more configuration to be done? 
Actually, can I manually set the number of working threads?


Thanks for any help. I hope I'm "almost there".

Joao


Re: [OMPI users] "almost there" in getting MPI to run

2011-12-15 Thread Ralph Castain
What was your cmd line when you ran the job?

On Dec 15, 2011, at 7:09 AM, Joao Amaral wrote:

> Hi all,
> 
> After trying cygwin and the windows build of Open MPI, I've now focused on 
> using linux for my mpif90 code testing/development on my laptop.
> 
> I've managed to install Open MPI, and it works, sort of.
> 
> Strangely(?), in both my laptop and the cluster, the number of threads from 
> the command
> 
> call MPI_Comm_size ( MPI_COMM_WORLD, p, error )
> 
> Results on only one active thread, when my laptop is a quad-core (should be 8 
> threads). The same happens running in the cluster, where each "blade" has 8 
> cores.
> 
> What am I missing here? Is there more configuration to be done? Actually, can 
> I manually set the number of working threads?
> 
> Thanks for any help. I hope I'm "almost there".
> 
> Joao
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Error launching w/ 1.5.3 on IB mthca nodes

2011-12-15 Thread V. Ram
Hi Terry,

Thanks so much for the response.  My replies are in-line below.

On Thu, Dec 15, 2011, at 07:00 AM, TERRY DONTJE wrote:
> IIRC, RNR's are usually due to the receiving side not having a segment 
> registered and ready to receive data on a QP.  The btl does go through a 
> big dance and does its own flow control to make sure this doesn't happen.
> 
> So when this happens are both the sending and receiving nodes using 
> mthca's to communicate with?

Yes.  For the newer nodes using onboard mlx4, this issue doesn't arise. 
The mlx4-based nodes are using the same core switch as the mthca nodes.

> By any chance is it a particular node (or pair of nodes) this seems to 
> happen with?

No.  I've got 40 nodes total with this hardware configuration, and the
problem has been seen on most/all nodes at one time or another.  It
doesn't seem, based on the limited number of observable parameters I'm
aware of, to be dependent on the number of nodes involved.

It is an intermittent problem, but when it happens, it happens at job
launch, and it does occur most of the time.

Thanks,

V. Ram

> --td
> >
> > Open MPI InfiniBand gurus and/or Mellanox: could I please get some
> > assistance with this? Any suggestions on tunables or debugging
> > parameters to try?
> >
> > Thank you very much.
> >
> > On Mon, Dec 12, 2011, at 10:42 AM, V. Ram wrote:
> > > Hello,
> > >
> > > We are running a cluster that has a good number of older nodes with
> > > Mellanox IB HCAs that have the "mthca" device name ("ib_mthca" kernel
> > > module).
> > >
> > > These adapters are all at firmware level 4.8.917 .
> > >
> > > The Open MPI in use is 1.5.3 , kernel 2.6.39 , x86-64. Jobs are
> > > launched/managed using Slurm 2.2.7. The IB software and drivers
> > > correspond to OFED 1.5.3.2 , and I've verified that the kernel modules
> > > in use are all from this OFED version.
> > >
> > > On nodes with the mthca hardware *only*, we get frequent, but
> > > intermittent job startup failures, with messages like:
> > >
> > > /
> > >
> > > [[19373,1],54][btl_openib_component.c:3320:handle_wc] from compute-c3-07
> > > to: compute-c3-01 error polling LP CQ with status RECEIVER NOT READY
> > > RETRY EXCEEDED ERROR status
> > > number 13 for wr_id 2a25c200 opcode 128 vendor error 135 qp_idx 0
> > >
> > > --
> > > The OpenFabrics "receiver not ready" retry count on a per-peer
> > > connection between two MPI processes has been exceeded. In general,
> > > this should not happen because Open MPI uses flow control on per-peer
> > > connections to ensure that receivers are always ready when data is
> > > sent.
> > >
> > > [further standard error text snipped...]
> > >
> > > Below is some information about the host that raised the error and the
> > > peer to which it was connected:
> > >
> > > Local host: compute-c3-07
> > > Local device: mthca0
> > > Peer host: compute-c3-01
> > >
> > > You may need to consult with your system administrator to get this
> > > problem fixed.
> > > --
> > >
> > > /
> > >
> > > During these job runs, I have monitored the InfiniBand performance
> > > counters on the endpoints and switch. No telltale counters for any of
> > > these ports change during these failed job initiations.
> > >
> > > ibdiagnet works fine and properly enumerates the fabric and related
> > > performance counters, both from the affected nodes, as well as other
> > > nodes attached to the IB switch. The IB connectivity itself seems fine
> > > from these nodes.
> > >
> > > Other nodes with different HCAs use the same InfiniBand fabric
> > > continuously without any issue, so I don't think it's the fabric/switch.
> > >
> > > I'm at a loss for what to do next to try and find the root cause of the
> > > issue. I suspect something perhaps having to do with the mthca
> > > support/drivers, but how can I track this down further?
> > >
> > > Thank you,
> > >
> > > V. Ram.

-- 
http://www.fastmail.fm - One of many happy users:
  http://www.fastmail.fm/docs/quotes.html



Re: [OMPI users] Error launching w/ 1.5.3 on IB mthca nodes

2011-12-15 Thread Jeff Squyres
Very strange.  I have a lot of older mthca-based HCAs in my Cisco MPI test 
cluster, and I don't see these kinds of problems.

Mellanox -- any ideas?


On Dec 15, 2011, at 7:24 PM, V. Ram wrote:

> Hi Terry,
> 
> Thanks so much for the response.  My replies are in-line below.
> 
> On Thu, Dec 15, 2011, at 07:00 AM, TERRY DONTJE wrote:
>> IIRC, RNR's are usually due to the receiving side not having a segment 
>> registered and ready to receive data on a QP.  The btl does go through a 
>> big dance and does its own flow control to make sure this doesn't happen.
>> 
>> So when this happens are both the sending and receiving nodes using 
>> mthca's to communicate with?
> 
> Yes.  For the newer nodes using onboard mlx4, this issue doesn't arise. 
> The mlx4-based nodes are using the same core switch as the mthca nodes.
> 
>> By any chance is it a particular node (or pair of nodes) this seems to 
>> happen with?
> 
> No.  I've got 40 nodes total with this hardware configuration, and the
> problem has been seen on most/all nodes at one time or another.  It
> doesn't seem, based on the limited number of observable parameters I'm
> aware of, to be dependent on the number of nodes involved.
> 
> It is an intermittent problem, but when it happens, it happens at job
> launch, and it does occur most of the time.
> 
> Thanks,
> 
> V. Ram
> 
>> --td
>>> 
>>> Open MPI InfiniBand gurus and/or Mellanox: could I please get some
>>> assistance with this? Any suggestions on tunables or debugging
>>> parameters to try?
>>> 
>>> Thank you very much.
>>> 
>>> On Mon, Dec 12, 2011, at 10:42 AM, V. Ram wrote:
 Hello,
 
 We are running a cluster that has a good number of older nodes with
 Mellanox IB HCAs that have the "mthca" device name ("ib_mthca" kernel
 module).
 
 These adapters are all at firmware level 4.8.917 .
 
 The Open MPI in use is 1.5.3 , kernel 2.6.39 , x86-64. Jobs are
 launched/managed using Slurm 2.2.7. The IB software and drivers
 correspond to OFED 1.5.3.2 , and I've verified that the kernel modules
 in use are all from this OFED version.
 
 On nodes with the mthca hardware *only*, we get frequent, but
 intermittent job startup failures, with messages like:
 
 /
 
 [[19373,1],54][btl_openib_component.c:3320:handle_wc] from compute-c3-07
 to: compute-c3-01 error polling LP CQ with status RECEIVER NOT READY
 RETRY EXCEEDED ERROR status
 number 13 for wr_id 2a25c200 opcode 128 vendor error 135 qp_idx 0
 
 --
 The OpenFabrics "receiver not ready" retry count on a per-peer
 connection between two MPI processes has been exceeded. In general,
 this should not happen because Open MPI uses flow control on per-peer
 connections to ensure that receivers are always ready when data is
 sent.
 
 [further standard error text snipped...]
 
 Below is some information about the host that raised the error and the
 peer to which it was connected:
 
 Local host: compute-c3-07
 Local device: mthca0
 Peer host: compute-c3-01
 
 You may need to consult with your system administrator to get this
 problem fixed.
 --
 
 /
 
 During these job runs, I have monitored the InfiniBand performance
 counters on the endpoints and switch. No telltale counters for any of
 these ports change during these failed job initiations.
 
 ibdiagnet works fine and properly enumerates the fabric and related
 performance counters, both from the affected nodes, as well as other
 nodes attached to the IB switch. The IB connectivity itself seems fine
 from these nodes.
 
 Other nodes with different HCAs use the same InfiniBand fabric
 continuously without any issue, so I don't think it's the fabric/switch.
 
 I'm at a loss for what to do next to try and find the root cause of the
 issue. I suspect something perhaps having to do with the mthca
 support/drivers, but how can I track this down further?
 
 Thank you,
 
 V. Ram.
> 
> -- 
> http://www.fastmail.fm - One of many happy users:
>  http://www.fastmail.fm/docs/quotes.html
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/