Re: [OMPI users] error: unknown type name 'ompi_jobid_t'

2013-06-26 Thread Ralph Castain
Sorry about that - it has been fixed in the upcoming 1.7.2, which should be
released in the immediate future. For now, you can grab the 1.7.2 tarball
from the web site.



On Tue, Jun 25, 2013 at 8:25 PM, Jeff Hammond wrote:

> I observe this error with the OpenMPI 1.7.1 "feature":
>
> Making all in mca/common/ofacm
> make[2]: Entering directory
>
> `/gpfs/mira-home/jhammond/MPI/openmpi-1.7.1/build-gcc/ompi/mca/common/ofacm'
>   CC   common_ofacm_xoob.lo
> ../../../../../ompi/mca/common/ofacm/common_ofacm_xoob.c:158:91:
> error: unknown type name 'ompi_jobid_t'
>  static int xoob_ib_address_init(ofacm_ib_address_t *ib_addr, uint16_t
> lid, uint64_t s_id, ompi_jobid_t ep_jobid)
>
> ^
> ../../../../../ompi/mca/common/ofacm/common_ofacm_xoob.c: In function
> 'xoob_ib_address_add_new':
> ../../../../../ompi/mca/common/ofacm/common_ofacm_xoob.c:189:5:
> warning: implicit declaration of function 'xoob_ib_address_init'
> [-Wimplicit-function-declaration]
>  ret = xoob_ib_address_init(ib_addr, lid, s_id, ep_jobid);
>  ^
> make[2]: *** [common_ofacm_xoob.lo] Error 1
> make[2]: Leaving directory
>
> `/gpfs/mira-home/jhammond/MPI/openmpi-1.7.1/build-gcc/ompi/mca/common/ofacm'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory
> `/gpfs/mira-home/jhammond/MPI/openmpi-1.7.1/build-gcc/ompi'
> make: *** [all-recursive] Error 1
>
> I invoked configure like this:
>
> ../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran
> --prefix=/home/jhammond/MPI/openmpi-1.7.1/install-gcc --with-verbs
> --enable-mpi-thread-multiple --enable-static --enable-shared
>
> My config.log is attached with bzip2 compression or if you do not
> trust binary attachments, please go to Dropbox and blindly download
> the uncompressed text file.
>
> https://www.dropbox.com/l/ZxZoE6FNROZuBY7I7wdsgc
>
> Any suggestions?  I asked the Google and it had not heard of this
> particular error message before.
>
> Thanks,
>
> Jeff
>
> PS Please do not tell Pavan I was here :-)
> PPS I recognize the Streisand effect is now in play and that someone
> will deliberately disobey the previous request because I made it.
>
> --
> Jeff Hammond
> jeff.scie...@gmail.com
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openmpi 1.6.3 fails to identify local host if its IP is 127.0.1.1

2013-06-26 Thread Riccardo Murri
Hello,

On 26 June 2013 03:11, Ralph Castain  wrote:
> I've been reviewing the code, and I think I'm getting a handle on
> the issue.
>
> Just to be clear - your hostname resolves to the 127 address? And you are on
> a Linux (not one of the BSD flavors out there)?

Yes (but resolves to 127.0.1.1 -- not the usual 127.0.0.1), and yes
(Rocks 5.3 ~= CentOS 5.3).


> If the answer to both is "yes", then the problem is that we ignore loopback
> devices if anything else is present. When we check to see if the hostname we
> were given is the local node, we resolve the name to the address and then
> check our list of interfaces. The loopback device is ignored and therefore
> not on the list. So if you resolve to the 127 address, we will decide this
> is a different node than the one we are on.
>
> I can modify that logic, but want to ensure this accurately captures the
> problem. I'll also have to discuss the change with the other developers to
> ensure we don't shoot ourselves in the foot if we make it.

Ok, thanks -- I'll keep an eye on your replies.

Thanks,
Riccardo


Re: [OMPI users] Application hangs on mpi_waitall

2013-06-26 Thread George Bosilca
Ed,

Im not sure but there might be a case where the BTL is getting overwhelmed by 
the nob-blocking operations while trying to setup the connection. There is a 
simple test for this. Add an MPI_Alltoall with a reasonable size (100k) before 
you start posting the non-blocking receives, and let's see if this solves your 
issue.

  George.


On Jun 26, 2013, at 04:02 , eblo...@1scom.net wrote:

> An update: I recoded the mpi_waitall as a loop over the requests with
> mpi_test and a 30 second timeout.  The timeout happens unpredictably,
> sometimes after 10 minutes of run time, other times after 15 minutes, for
> the exact same case.
> 
> After 30 seconds, I print out the status of all outstanding receive
> requests.  The message tags that are outstanding have definitely been
> sent, so I am wondering why they are not getting received?
> 
> As I said before, everybody posts non-blocking standard receives, then
> non-blocking standard sends, then calls mpi_waitall. Each process is
> typically waiting on 200 to 300 requests. Is deadlock possible via this
> implementation approach under some kind of unusual conditions?
> 
> Thanks again,
> 
> Ed
> 
>> I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never
>> returns.  The case runs fine with MVAPICH.  The logic associated with the
>> communications has been extensively debugged in the past; we don't think
>> it has errors.   Each process posts non-blocking receives, non-blocking
>> sends, and then does waitall on all the outstanding requests.
>> 
>> The work is broken down into 960 chunks. If I run with 960 processes (60
>> nodes of 16 cores each), things seem to work.  If I use 160 processes
>> (each process handling 6 chunks of work), then each process is handling 6
>> times as much communication, and that is the case that hangs with OpenMPI
>> 1.6.4; again, seems to work with MVAPICH.  Is there an obvious place to
>> start, diagnostically?  We're using the openib btl.
>> 
>> Thanks,
>> 
>> Ed
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] openmpi 1.6.3 fails to identify local host if its IP is 127.0.1.1

2013-06-26 Thread Ralph Castain
The root cause of the problem is that you are assigning your host name to
the loopback device. This is rather unusual, but not forbidden. Normally,
people would name that interface something like "localhost" since it cannot
be used to communicate off-node.

Doing it the way you have could cause problems for you as programs that do
a lookup to communicate will get the loopback address when they might have
expected something else. Still, we should handle this case.

I'll see what we can do



On Wed, Jun 26, 2013 at 2:26 AM, Riccardo Murri wrote:

> Hello,
>
> On 26 June 2013 03:11, Ralph Castain  wrote:
> > I've been reviewing the code, and I think I'm getting a handle on
> > the issue.
> >
> > Just to be clear - your hostname resolves to the 127 address? And you
> are on
> > a Linux (not one of the BSD flavors out there)?
>
> Yes (but resolves to 127.0.1.1 -- not the usual 127.0.0.1), and yes
> (Rocks 5.3 ~= CentOS 5.3).
>
>
> > If the answer to both is "yes", then the problem is that we ignore
> loopback
> > devices if anything else is present. When we check to see if the
> hostname we
> > were given is the local node, we resolve the name to the address and then
> > check our list of interfaces. The loopback device is ignored and
> therefore
> > not on the list. So if you resolve to the 127 address, we will decide
> this
> > is a different node than the one we are on.
> >
> > I can modify that logic, but want to ensure this accurately captures the
> > problem. I'll also have to discuss the change with the other developers
> to
> > ensure we don't shoot ourselves in the foot if we make it.
>
> Ok, thanks -- I'll keep an eye on your replies.
>
> Thanks,
> Riccardo
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] mpif90 error with different openmpi editions

2013-06-26 Thread xu

No. I didn't mix environment variables. I run two editions seperately. I 
searched online, one possiblity is use different mpif90 and mpicc, but I 
checked in my case they all use gcc 4.3.4





 From: Gus Correa 
To: Open MPI Users  
Sent: Tuesday, June 18, 2013 8:44 AM
Subject: Re: [OMPI users] mpif90 error with different openmpi editions


On 06/18/2013 12:28 AM, xu wrote:
> my code get this error under openmpi 1.6.4
> mpif90 -O2 -m64 -fbounds-check -ffree-line-length-0 -c -o 2dem_mpi.o
> 2dem_mpi.f90 Fatal Error: Reading module mpi at line 110 column 30:
> Expected string
> If I use mpif90: Open MPI 1.3.3
> It compiles ok. What the problem for this?

Make sure you are not mixing environment variables (PATH and 
LD_LIBRARY_PATH) of the two OMPI versions you installed.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] mpif90 error with different openmpi editions

2013-06-26 Thread Gus Correa

You say:

> one possiblity is use different mpif90 and mpicc, but I
> checked in my case they all use gcc 4.3.4

Do you really mean gcc for both,
or is it gfortran for mpif90 perhaps?

What is the output of:

mpif90 --show
and
mpicc --show

for each OMPI version?

Maybe other list subscribers can help, but I'd suggest that
besides the information above, you send in your
configure command line for each OMPI version.
It is hard to guess what is the problem from the tidbits
of information that you sent.

I hope this helps,
Gus Correa


On 06/26/2013 04:22 PM, xu wrote:


No. I didn't mix environment variables. I run two editions seperately. I
searched online, one possiblity is use different mpif90 and mpicc, but I
checked in my case they all use gcc 4.3.4



*From:* Gus Correa 
*To:* Open MPI Users 
*Sent:* Tuesday, June 18, 2013 8:44 AM
*Subject:* Re: [OMPI users] mpif90 error with different openmpi editions

On 06/18/2013 12:28 AM, xu wrote:
 > my code get this error under openmpi 1.6.4
 > mpif90 -O2 -m64 -fbounds-check -ffree-line-length-0 -c -o 2dem_mpi.o
 > 2dem_mpi.f90 Fatal Error: Reading module mpi at line 110 column 30:
 > Expected string
 > If I use mpif90: Open MPI 1.3.3
 > It compiles ok. What the problem for this?

Make sure you are not mixing environment variables (PATH and
LD_LIBRARY_PATH) of the two OMPI versions you installed.


___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users