Dear All
I need your help to solve this cluster related issue causing mpirun
malfunction. I get following warning for some of the nodes and then the
route failure message comes causing failure to mpirun.
*WARNING: There is at least one OpenFabrics device found but there are no
active ports dete
l (Pasha) Shamis
> ---
> Computer Science Research Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
>
>
>
>
>
>
> On Jul 21, 2014, at 3:17 AM, Syed Ahsan Ali ahsansha...@gmail.com>> wrote:
>
> Dear All
>
> I need your help
are down on
> 01-01.
>
> You may disable support for infiniband by adding --mca btl ^openib.
>
> Best,
> Pavel (Pasha) Shamis
> ---
> Computer Science Research Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
>
>
>
>
>
id/lid (sort of equivalent of mac address in ethernet world).
> As you can guess these two can not be identical for two different machines
> (unless you moved the card around).
>
> Best,
> Pasha
>
> On Jul 21, 2014, at 11:26 PM, Syed Ahsan Ali <mailto:ahsansha...@gmail.com
shooting-infiniband-connection-issues-using-ofed-tools
>
> You might also try try running only over port 1 with the mca parameter:
>
> -mca btl_openib_if_include mlx4_0:1
>
> Hope this helps.
>
> Josh
>
>
> On Tue, Jul 22, 2014 at 12:10 AM, Syed Ahsan Ali
> wro
I want to compile openmpi with both intel and gnu compilers. How can
I install both at the same time and then specify which one to use during
job submission.
Regards
Ahsan
Issue resolved.
On Wed, Aug 6, 2014 at 2:48 PM, Syed Ahsan Ali
wrote:
> I have following error while compiling
>
>
> *** Fortran compiler
> checking whether we are using the GNU Fortran compiler... yes
> checking whether /opt/gcc-4.9.1/bin/gfortran accepts -g... yes
> con
I have problems in compilation of openmpi-1.8.1 on Linux machine. Kindly
see the logs attached.
configure.bz2
Description: BZip2 compressed data
ubscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/08/25150.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/
Dear All
I need your advice. While trying to run mpirun job across nodes I get
following error. It seems that the two nodes i.e, compute-01-01 and
compute-01-06 are not able to communicate with each other. While nodes
see each other on ping.
[pmdtest@pmd ERA_CLM45]$ mpirun -np 16 -hostfile hostli
see all help / error messages
On Wed, Nov 12, 2014 at 7:32 PM, Jeff Squyres (jsquyres)
wrote:
> Do you have firewalling enabled on either server?
>
> See this FAQ item:
>
> http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems
>
>
>
> On
~]$
On Thu, Nov 13, 2014 at 12:03 PM, Syed Ahsan Ali wrote:
> Hi Jefff
>
> No firewall is enabled. Running the diagnostics I found that non
> communication mpi job is running . While ring_c remains stuck. There
> are of course warnings for open fabrics but in my case I an running
&
; could you try
>
> $ mpirun --mca btl ^openib --host compute-01-01,compute-01-06 ring_c
>
>
> can you also try to run mpirun from a compute node instead of the head
> node ?
>
> Cheers,
>
> Gilles
>
> On 2014/11/13 16:07, Syed Ahsan Ali wrote:
>> Here is what I
r via this address ?
> /* e.g. from compute-01-01 can you ping the 192.168.108.* ip address of
> compute-01-06 ? */
>
> could you also run
>
> mpirun --mca btl ^openib --host compute-01-01,compute-01-06 --mca
> btl_tcp_if_include 10.0.0.0/8 ring_c
>
> and see whether it help
option is you to use
> --mca btl_tcp_if_exclude ib0
>
> On 2014/11/13 16:43, Syed Ahsan Ali wrote:
>> You are right it is running on 10.0.0.0 interface [pmdtest@pmd ~]$
>> mpirun --mca btl ^openib --host compute-01-01,compute-01-06 --mca
>> btl_tcp_if_include 10.0.0.0/8 ring_c
>
Ok ok I can disable that as well.
Thank you guys. :)
On Thu, Nov 13, 2014 at 12:50 PM, Syed Ahsan Ali wrote:
> Now it looks through the loopback address
>
> [pmdtest@pmd ~]$ mpirun --host compute-01-01,compute-01-06 --mca
> btl_tcp_if_exclude ib0 ring_c
> Process 0 sending 10 t
gt; can you run on both compute nodes ?
> netstat -nr
>
>
> On 2014/11/13 16:50, Syed Ahsan Ali wrote:
>> Now it looks through the loopback address
>>
>> [pmdtest@pmd ~]$ mpirun --host compute-01-01,compute-01-06 --mca
>> btl_tcp_if_exclude ib0 ring_c
>> Process 0
10.0.0.10.0.0.0 UG0 0 0 eth0
>> [pmdtest@compute-01-06 ~]$
>>
>>
>> On Thu, Nov 13, 2014 at 12:56 PM, Gilles Gouaillardet
>> wrote:
>>> This is really weird ?
>>>
>>> is the loopback interface up and running on
I am trying to run openmpi application on my cluster. But the mpirun
fails, simple hostname command gives this error
[pmdtest@hpc bin]$ mpirun --host compute-0-0 hostname
--
Sorry! You were supposed to get help about:
op
tion is that the Rocks documentation was obscure
> about this, not making clear the difference between
> /export/apps and /share/apps.
>
> Issuing the Rocks commands:
> "tentakel 'ls -d /export/apps'"
> "tentakel 'ls -d /share/apps'"
> ma
[pmdtest@hpc bin]$
Ahsan
On Fri, Feb 27, 2015 at 10:17 PM, Gus Correa wrote:
> Hi Syed Ahsan Ali
>
> To avoid any leftovers and further confusion,
> I suggest that you delete completely the old installation directory.
> Then start fresh from the configure step with the pre
Oh sorry. That is related to application. I need to recompile
application too I guess.
On Fri, Feb 27, 2015 at 10:44 PM, Syed Ahsan Ali wrote:
> Dear Gus
>
> Thanks once again for suggestion. Yes I did that before installation
> to new path. I am getting error now about some library
.
configure: error: in `/home/precis/opemmpi/openmpi-1.4.2':
configure: error: C compiler cannot create executables
See `config.log' for more details.
--
Kind Regards
Syed Ahsan Ali Bokhari
Dear All,
I am running an application with mpirun but it gives following error, it is
not picking up hostlist, there are other applications which run well with
hostlist but it just gives following error with
[pmdtest@pmd02 d00_dayfiles]$ tail -f *_hrm
mpirun -np /home/MET/hrm/bin/hrm
-
tem with same configuration.
On Tue, Feb 28, 2012 at 10:12 AM, PukkiMonkey wrote:
> No of processes missing after -np
> Should be something like:
> mpirun -np 256 ./exec
>
>
>
> Sent from my iPhone
>
> On Feb 27, 2012, at 8:47 PM, Syed Ahsan Ali wrote:
>
>
gt; argument for -np missing or argument not being numeric?
> >
> > Probably - I'm sure that the atol is returning zero, which should cause
> an error output. I'll check.
> >
> >
> >>
> >> -- Reuti
> >>
> >>
> >>>
>
ooks fine, can u add --mca btl_openib_verbose 1 to the mpirun
> argument list, and see what it says?
>
>
>
> On Tue, Feb 28, 2012 at 10:15 PM, Syed Ahsan Ali wrote:
>
>> After creating new hostlist and making the scripts again it is working
>> now and picking up the ho
Sorry Jeff I couldn't get you point.
On Wed, Feb 29, 2012 at 4:27 PM, Jeffrey Squyres wrote:
> On Feb 29, 2012, at 2:17 AM, Syed Ahsan Ali wrote:
>
> > [pmdtest@pmd02 d00_dayfiles]$ echo ${MPIRUN} -np ${NPROC} -hostfile
> $i{ABSDIR}/hostlist -mca btl sm,openib,self --mca
outfile ,
> you were piping the output to the outfile instead of stdout.
>
> Sent from my iPhone
>
> On Feb 29, 2012, at 8:44 PM, Syed Ahsan Ali wrote:
>
> Sorry Jeff I couldn't get you point.
>
> On Wed, Feb 29, 2012 at 4:27 PM, Jeffrey Squyres wrote:
>
Dear All,
I am having problem with running an application on Dell cluster . The model
starts well but no further progress is shown. It just stuck. I have checked
the systems, no apparent hardware error is there. Other open mpi
applications are running well on the same cluster. I have tried running
sourceforge.net/
>
> Scalable Grid Engine Support Program
> http://www.scalablelogic.com/
>
>
> On Tue, Apr 24, 2012 at 12:49 AM, Syed Ahsan Ali
> wrote:
> > Dear All,
> >
> > I am having problem with running an application on Dell cluster . The
> mode
ocesses is the job using? Are you oversubscribing your
> processors?
> What version of Open MPI are you using?
> Have you tested all network connections?
> It might help us to know the size of cluster you are running and what
> type of network?
>
> --td
>
> On 4/24/2012
Dear All
I am getting following error while compilation of an application. Seems
like something related to netcdf and mpif90. Although I have compiled
netcdf with mpif90 option, dont why this error is happening. Any hint would
be highly appreciated.
/home/pmdtest/cosmo/source/cosmo_110525_4.18/
cally, you can take the mpif90 command that is being used to
> generate these errors and add "--showme" to the end of it, and you'll see
> what underlying compiler command is being executed under the covers. That
> might help you understand exactly what is going on.
&g
gs are compiled with a different naming notation (check names in the
> lib really contain the expected number of final underscores).
>
> I compiled cosmo 4.22 with openmpi and netcdf not long ago without any
> problems.
>
> Best,
> - Dima.
>
> 2012/6/26 Syed Ahsan Ali
>
&g
h the
> underlying Fortran compiler.
>
>
> --
> Tim Prince
>
> __**_
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users>
>
--
Syed Ahsan Ali Bokhari
Electronic Engineer
Dear All
I am having problem while running an application on cluster. The
application was working fine but now this error has arised . we used to run
the application the same way with user pmdtest and there was no error. I
dont know which permission it is asking for. Please help!
[pmdtest@pmd02
.org [mailto:users-boun...@open-mpi.org] *On
> Behalf Of *Syed Ahsan Ali
> *Sent:* 01 August 2012 08:45
> *To:* Open MPI Users
> *Subject:* [OMPI users] Permission denied, please try again.
>
> ** **
>
>
> Dear All
>
> ** **
>
> I am having problem while
Yes all the compute nodes are NFS mounted with the master node, so
everthing is same, all other nodes are accessible on ssh without password.
On Thu, Aug 2, 2012 at 1:09 PM, John Hearns wrote:
> On 02/08/2012, Syed Ahsan Ali wrote:
> > Yes the issue has been diagnosed. I can ssh them
Am 02.08.2012 um 17:57 schrieb Syed Ahsan Ali:
>
> > Yes all the compute nodes are NFS mounted with the master node, so
> everthing is same, all other nodes are accessible on ssh without password.
>
> Are you using a queuing system?
>
> SSH could be setup to work from the mas
Dear All
I have a Dell Cluster running Platform Cluster Manager (PCM) , the compute
nodes are NFS mounted with the master node. Storage (SAN) is mounted to the
installer node only, the problem is that I am running a programme which
uses data which resides on Storage , so as far as running the prog
re the cluster head node to export the SAN volume by NFS
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
--
Syed Ahsan Ali Bokhari
Electronic Engineer (EE)
Research & Develo
//www.open-mpi.org/mailman/listinfo.cgi/users
>
--
Syed Ahsan Ali Bokhari
Electronic Engineer (EE)
Research & Development Division
Pakistan Meteorological Department H-8/4, Islamabad.
Phone # off +92518358714
Cell # +923155145014
Dear All
I have an application which is run using openmpi and uses infiniband flags.
The application is a forecast model simulation. A frequent problem arises
that the Infiniband mezzanine cards of servers become faulty (don't know
the reason why it happens so frequent), the model simulation becom
_
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
--
Syed Ahsan Ali Bokhari
Electronic Engineer (EE)
Research & Development Division
Pakistan Meteorological Department H-8/4, Islamabad.
Phone # off +92518358714
Cell # +923155145014
; What type of infiniband card do you have?
> What drivers are you using?
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
--
Syed Ahsan Ali Bokhari
Electronic Engineer (
I recieve following error while running an application
Does this represent any hardware issue?
[compute-01-01.private.dns.zone][[60090,1],10][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection timed out (110)
[compute-01-01.private.dns.zone][[60090,1],13][btl
Dear John
I found this output of ibstatus on some nodes (most probably the problem
causing)
[root@compute-01-08 ~]# ibstatus
Fatal error: device '*': sys files not found
(/sys/class/infiniband/*/ports)
Does this show any hardware or software issue?
Thanks
On Wed, Nov 28, 2012 at 3:17 PM, Jo
=12,pgrp=3876,timeout=300,minproto=5,maxproto=5,indirect 0 0
Thanks and Regards
On Wed, Dec 19, 2012 at 8:38 PM, Yann Droneaud wrote:
> Le mercredi 19 décembre 2012 à 12:12 +0500, Syed Ahsan Ali a écrit :
> > Dear John
> >
> > I found this output of ibstatus on some nod
I am getting following error while bulding openmpi
*** Fortran 90/95 compiler
checking whether we are using the GNU Fortran compiler... yes
checking whether gfortran accepts -g... yes
checking if Fortran 77 compiler works... no
**
d here:
>
> http://www.open-mpi.org/community/help/
>
>
> On Feb 1, 2013, at 5:58 AM, Syed Ahsan Ali wrote:
>
> >
> > I am getting following error while bulding openmpi
> >
> > *** Fortran 90/95 compiler
> > checking whether we are using the GNU
re:28765: error: Could not run a simple Fortran 77 program.
> Aborting.
> -
>
> Perhaps you need to set your LD_LIBRARY_PATH to point to where libgfortran
> is located?
>
> In short: when you can run gfortran manually to compile/run trivial
> fortran programs, then configure will suc
I have been running this program successfully before but some copy
operation from /usr/ directory has caused this error.
The program runs fine on the cores of the same machine.
libhdf5.so.7 is also present.
[pmdtest@pmd HadGEM]$ mpirun -np 32 -hostfile hostlist rca.x
rca.x: error while loading sh
Dear John
Thanks for the reply. I'll need help of you people to solve this problem. I
am not expert in HPC and this would be my learning as well. Let me add that
the cluster is based on Platform Cluster Manager (PCM) by IBM Computing.
The compute nodes are NFS mounted with the installer node. There
Dear John
Looking into output of ldd for master and compute nodes solved my problem.
Thanks for such a simple solution. :)
On Thu, Feb 7, 2013 at 9:37 PM, Syed Ahsan Ali wrote:
> Dear John
> Thanks for the reply. I'll need help of you people to solve this problem.
> I am not exp
I have a very basic question. If we want to run mpirun job on two systems
which are not part of cluster, then how we can make it possible. Can the
host be specifiend on mpirun which is not compute node, rather a stand
alone system.
Thanks
Ahsan
? just as the compute nodes are nfs mounted
with the installer node.
Ahsan
On Fri, Mar 22, 2013 at 3:33 PM, Reuti wrote:
> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali:
>
> > I have a very basic question. If we want to run mpirun job on two
> systems which are not part of clust
ode.
>
>
> Ahsan
>
>
> On Fri, Mar 22, 2013 at 3:33 PM, Reuti wrote:
>
>> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali:
>>
>> > I have a very basic question. If we want to run mpirun job on two
>> systems which are not part of cluster, then h
It may be because the other system is running upgraded version of linux
which is not having infiniband drivers. Any solution?
On Tue, Mar 26, 2013 at 12:42 PM, Syed Ahsan Ali wrote:
> Tried this but mpirun exits with this error
>
> mpirun -np 40 /home/MET/hrm/bin/hrm
> librdmacm: c
Dear All
I am trying to compile openmpi-1.6.5 on fc16.x86_64 with icc and ifort
but getting the subject error. config.out and make.out is attached.
Following command was used for configure
./configure CC=icc CXX=icpc FC=ifort F77=ifort F90=ifort
--prefix=/home/openmpi_gfortran -enable-mpi-f90
Please find attached again.
On Tue, Sep 17, 2013 at 11:35 AM, Jeff Squyres (jsquyres)
wrote:
> On Sep 16, 2013, at 9:00 AM, Syed Ahsan Ali wrote:
>
>> I am trying to compile openmpi-1.6.5 on fc16.x86_64 with icc and ifort
>> but getting the subject error. config.out and ma
I am trying to compile openmpi-1.6.5 on fc16.x86_64 with icc and ifort
but getting the subject error. config.out and make.out is attached.
Following command was used for configure
./configure CC=icc CXX=icpc FC=ifort F77=ifort F90=ifort
--prefix=/home/openmpi_gfortran -enable-mpi-f90 --enable-mpi
Output of make V=1 is attached. Again same error. If intel compiler is
using C++ headers from gfortran then how can we avoid this.
On Fri, Sep 20, 2013 at 11:07 AM, Bert Wesarg
wrote:
> Hi,
>
> On Fri, Sep 20, 2013 at 4:49 AM, Syed Ahsan Ali wrote:
>> I am trying to compile o
essing* that this is a problem with your local icpc installation.
>
> Can you compile / run other C++ codes that use the STL with icpc?
>
>
> On Sep 20, 2013, at 6:59 AM, Syed Ahsan Ali wrote:
>
>> Output of make V=1 is attached. Again same error. If intel compiler is
&g
>
>
> On Sep 22, 2013, at 10:40 AM, Syed Ahsan Ali wrote:
>
>> Its ok Jeff.
>> I am not sure about other C++ codes and STL with icpc because it never
>> happened and I don't know anything about STL.(pardon my less
>> knowledge). What do you suggest in this
Dear Jeff
Thank you for explaining. Please find attached test logs which explain
the error.
Regards
On Fri, Sep 27, 2013 at 6:12 PM, Jeff Squyres (jsquyres)
wrote:
> On Sep 27, 2013, at 6:53 AM, Syed Ahsan Ali wrote:
>
>> Thank you very much Jeff. It worked now.
>
> Good.
&g
Dear All
I am getting infiniband errors while running mpirun applications on
cluster. I get these errors even when I don't include infiniband usage
flags in mpirun command. Please guide
mpirun -np 72 -hostfile hostlist ../bin/regcmMPI regcm.in
case TCP.
> However, the TCP transport is unable to create a socket to the remote host.
> The most likely cause is a firewall, so you might want to check that and
> turn it off.
>
>
> On Jan 19, 2014, at 4:19 AM, Syed Ahsan Ali wrote:
>
> Dear All
>
> I am getting infi
t;
> Also, if you use host names in your hostfile, I guess they need to be able
> to
> resolve the names into IP addresses.
> Check if your /etc/hosts file, DNS server, or whatever you
> use for name resolution, is correct and consistent across the cluster.
>
> On Jan 19, 2014, at 10:
I am getting this error during installation of an application.
Apparently the error seems to be complaining about openmpi being
compiled with different version of gnu fortran but I am sure that it
was compiled with gcc-4.9.2. The same is also being used for
application compilation.
I am using open
is likely mpifort simply run gfortran, and your PATH does not point to
> gfortran 4.9.2
>
> Cheers,
>
> Gilles
>
>
> On 7/28/2015 1:47 PM, Syed Ahsan Ali wrote:
>>
>> I am getting this error during installation of an application.
>> Apparently the error se
71 matches
Mail list logo