Re: [OMPI users] unresolvable R_X86_64_64 relocation against symbol `mpi_fortran_*

2011-10-03 Thread Dmitry N. Mikushin
Hi,

Here's a reprocase, the same one as mentioned here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=608901

marcusmae@loveland:~/Programming/mpitest$ cat mpitest.f90
program main
include 'mpif.h'
integer ierr
call mpi_init(ierr)
end

marcusmae@loveland:~/Programming/mpitest$ mpif90 -g mpitest.f90
/usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x542): unresolvable
R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
/usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x55c): unresolvable
R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
/usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x5d2): unresolvable
R_X86_64_64 relocation against symbol `mpi_fortran_errcodes_ignore_'
/usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x5ec): unresolvable
R_X86_64_64 relocation against symbol `mpi_fortran_errcodes_ignore_'

Remove "-g", and the error will be gone.

marcusmae@loveland:~/Programming/mpitest$ mpif90 --showme -g mpitest.f90
gfortran -g mpitest.f90 -I/opt/openmpi_gcc-1.5.4/include -pthread
-I/opt/openmpi_gcc-1.5.4/lib -L/opt/openmpi_gcc-1.5.4/lib -lmpi_f90
-lmpi_f77 -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl

marcusmae@loveland:~/Programming/mpitest$ mpif90 -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gfortran
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6.1/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
4.6.1-9ubuntu3'
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
--program-suffix=-4.6 --enable-shared --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin
--enable-objc-gc --disable-werror --with-arch-32=i686
--with-tune=generic --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)

2011/9/28 Dmitry N. Mikushin :
> Hi,
>
> Interestingly, the errors are gone after I removed "-g" from the app
> compile options.
>
> I tested again on the fresh Ubuntu 11.10 install: both 1.4.3 and 1.5.4
> compile fine, but with the same error.
> Also I tried hard to find any 32-bit object or library and failed.
> They all are 64-bit.
>
> - D.
>
> 2011/9/24 Jeff Squyres :
>> Check the output from when you ran Open MPI's configure and "make all" -- 
>> did it decide to build the F77 interface?
>>
>> Also check that gcc and gfortran output .o files of the same bitness / type.
>>
>>
>> On Sep 24, 2011, at 8:07 AM, Dmitry N. Mikushin wrote:
>>
>>> Compile and link - yes, but it turns out there was some unnoticed
>>> compilation error because
>>>
>>> ./hellompi: error while loading shared libraries: libmpi_f77.so.1:
>>> cannot open shared object file: No such file or directory
>>>
>>> and this library does not exist.
>>>
>>> Hm.
>>>
>>> 2011/9/24 Jeff Squyres :
 Can you compile / link simple OMPI applications without this problem?

 On Sep 24, 2011, at 7:54 AM, Dmitry N. Mikushin wrote:

> Hi Jeff,
>
> Today I've verified this application on the Feroda 15 x86_64, where
> I'm usually building OpenMPI from source using the same method.
> Result: no link errors there! So, the issue is likely ubuntu-specific.
>
> Target application is compiled linked with mpif90 pointing to
> /opt/openmpi_gcc-1.5.4/bin/mpif90 I built.
>
> Regarding architectures, everything in target folders and OpenMPI
> installation is
> ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically
> linked, not stripped
>
> - D.
>
> 2011/9/24 Jeff Squyres :
>> How does the target application compile / link itself?
>>
>> Try running "file" on the Open MPI libraries and/or your target 
>> application .o files to see what their bitness is, etc.
>>
>>
>> On Sep 22, 2011, at 3:15 PM, Dmitry N. Mikushin wrote:
>>
>>> Hi Jeff,
>>>
>>> You're right because I also tried 1.4.3, and it's the same issue
>>> there. But what could be wrong? I'm using the simplest form -
>>> ../configure --prefix=/opt/openmpi_gcc-1.4.3/ and only installed
>>> compilers are system-default gcc and gfortran 4.6.1. Distro is ubuntu
>>> 11.10. There is no any mpi installed from packages, and no -m32
>>> options around. What else could be the source?
>>>
>>> Thanks,
>>> - D.
>>>
>>> 2011/9/22 Jeff Squyres :
 This usually means that you're mixing compiler/linker flags somehow 
 (e.g., built something with 32 bit, built something else with 64 bit, 
 try to link them together).

 Can you verify that everything was built with all the same 32/64?


 On Sep 22, 2011, at 1:21 PM, Dmi

Re: [OMPI users] [SOLVED] unresolvable R_X86_64_64 relocation against symbol `mpi_fortran_*

2011-10-03 Thread Dmitry N. Mikushin
Ok, here's the solution: remove --as-needed option out of compiler's
internal linker invocation command line. Steps to do this:

1) Dump compiler specs: $ gcc -dumpspecs > specs
2) Open specs file for edit and remove --as-needed from the line

*link:
%{!r:--build-id} --no-add-needed --as-needed %{!static:--eh-frame-hdr}
%{!m32:-m elf_x86_64} %{m32:-m elf_i386} --hash-style=gnu
%{shared:-shared}   %{!shared: %{!static:
%{rdynamic:-export-dynamic}   %{m32:-dynamic-linker
%{muclibc:/lib/ld-uClibc.so.0;:%{mbionic:/system/bin/linker;:/lib/ld-linux.so.2}}}
  %{!m32:-dynamic-linker
%{muclibc:/lib/ld64-uClibc.so.0;:%{mbionic:/system/bin/linker64;:/lib64/ld-linux-x86-64.so.2
%{static:-static}}

resulting into

*link:
%{!r:--build-id} --no-add-needed %{!static:--eh-frame-hdr} %{!m32:-m
elf_x86_64} %{m32:-m elf_i386} --hash-style=gnu   %{shared:-shared}
%{!shared: %{!static:   %{rdynamic:-export-dynamic}
%{m32:-dynamic-linker
%{muclibc:/lib/ld-uClibc.so.0;:%{mbionic:/system/bin/linker;:/lib/ld-linux.so.2}}}
  %{!m32:-dynamic-linker
%{muclibc:/lib/ld64-uClibc.so.0;:%{mbionic:/system/bin/linker64;:/lib64/ld-linux-x86-64.so.2
%{static:-static}}

3) Save specs file into compiler's folder
/usr/lib/gcc/// For example, in case of Ubuntu 10.10
with gcc 4.6.1 it's /usr/lib/gcc/x86_64-linux-gnu/4.6.1/

With this change no unresolvable relocations anymore!

- D.

2011/10/3 Dmitry N. Mikushin :
> Hi,
>
> Here's a reprocase, the same one as mentioned here:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=608901
>
> marcusmae@loveland:~/Programming/mpitest$ cat mpitest.f90
> program main
> include 'mpif.h'
> integer ierr
> call mpi_init(ierr)
> end
>
> marcusmae@loveland:~/Programming/mpitest$ mpif90 -g mpitest.f90
> /usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x542): unresolvable
> R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
> /usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x55c): unresolvable
> R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
> /usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x5d2): unresolvable
> R_X86_64_64 relocation against symbol `mpi_fortran_errcodes_ignore_'
> /usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x5ec): unresolvable
> R_X86_64_64 relocation against symbol `mpi_fortran_errcodes_ignore_'
>
> Remove "-g", and the error will be gone.
>
> marcusmae@loveland:~/Programming/mpitest$ mpif90 --showme -g mpitest.f90
> gfortran -g mpitest.f90 -I/opt/openmpi_gcc-1.5.4/include -pthread
> -I/opt/openmpi_gcc-1.5.4/lib -L/opt/openmpi_gcc-1.5.4/lib -lmpi_f90
> -lmpi_f77 -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl
>
> marcusmae@loveland:~/Programming/mpitest$ mpif90 -v
> Using built-in specs.
> COLLECT_GCC=/usr/bin/gfortran
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6.1/lto-wrapper
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
> 4.6.1-9ubuntu3'
> --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
> --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
> --program-suffix=-4.6 --enable-shared --enable-linker-build-id
> --with-system-zlib --libexecdir=/usr/lib --without-included-gettext
> --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6
> --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
> --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin
> --enable-objc-gc --disable-werror --with-arch-32=i686
> --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu
> --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)
>
> 2011/9/28 Dmitry N. Mikushin :
>> Hi,
>>
>> Interestingly, the errors are gone after I removed "-g" from the app
>> compile options.
>>
>> I tested again on the fresh Ubuntu 11.10 install: both 1.4.3 and 1.5.4
>> compile fine, but with the same error.
>> Also I tried hard to find any 32-bit object or library and failed.
>> They all are 64-bit.
>>
>> - D.
>>
>> 2011/9/24 Jeff Squyres :
>>> Check the output from when you ran Open MPI's configure and "make all" -- 
>>> did it decide to build the F77 interface?
>>>
>>> Also check that gcc and gfortran output .o files of the same bitness / type.
>>>
>>>
>>> On Sep 24, 2011, at 8:07 AM, Dmitry N. Mikushin wrote:
>>>
 Compile and link - yes, but it turns out there was some unnoticed
 compilation error because

 ./hellompi: error while loading shared libraries: libmpi_f77.so.1:
 cannot open shared object file: No such file or directory

 and this library does not exist.

 Hm.

 2011/9/24 Jeff Squyres :
> Can you compile / link simple OMPI applications without this problem?
>
> On Sep 24, 2011, at 7:54 AM, Dmitry N. Mikushin wrote:
>
>> Hi Jeff,
>>
>> Today I've verified this application on the Feroda 15 x86_64, where
>> I'm usually building OpenMPI from source using the same method.
>> Result: no link errors there!

Re: [OMPI users] Proper way to stop MPI process

2011-10-03 Thread Jeff Squyres
You might want to double check this -- mpirun shouldn't be waiting on you 
hitting return.  Check to make sure you don't just have line-buffered output in 
python, or somesuch.  Or better yet, check from python that the PID has 
actually disappeared and don't rely on stdout, or something like that.


On Oct 2, 2011, at 8:35 AM, Xin Tong wrote:

> I am using 1.4.3. I send the sigterm from a python script. Then I wait, the 
> processes do not terminate until i keep pressing enter on the keyboard. 
> 
> Thanks 
> 
> 
> Xin 
> 
> 
> On Fri, Sep 30, 2011 at 10:10 PM, Ralph Castain  wrote:
> Sigterm should work - what version are you using?
> Ralph
> 
> Sent from my iPad
> 
> On Sep 28, 2011, at 1:40 PM, Xin Tong  wrote:
> 
> > I am wondering what the proper way of stop a mpirun process and the child 
> > process it created. I tried to send SIGTERM,  it does not respond to it ? 
> > What kind of signal should I be sending to it ?
> >
> >
> > Thanks
> >
> >
> > Xin
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Segfault on any MPI communication on head node

2011-10-03 Thread Phillip Vassenkov

I went into the directory that I used to install 1.4.3, did the following:
make clean
./configure --enable-debug
make -j8 all install

and it hangs at this when I try to run my code (I commented out all the 
host name stuff, so it's just MPI code now)


[hostname:16574] [[17705,0],0] ORTE_ERROR_LOG: Buffer type (described vs 
non-described) mismatch - operation not allowed in file 
base/odls_base_default_fns.c at line 2600


I'm googling for more info but does anyone have any ideas?

On 9/28/11 8:30 PM, Jeff Squyres wrote:

Use --enable-debug on your configure line.  This will add in some debugging 
code to OMPI, and it'll compile everything with -g so that you can get stack 
traces.

Beware that the extra debugging junk makes OMPI slightly slower; don't do any 
benchmarking with this install, etc.


On Sep 28, 2011, at 6:27 PM, Phillip Vassenkov wrote:


I tried 1.4.4rc4, same problem. Where do I get a debugging version?

On 9/28/11 8:32 AM, Jeff Squyres wrote:

Agreed that the original program had the char*[20]/char[20] bug, but his segv 
is occurring before trying to use that array.  So it's a bug - but he just 
hadn't hit it yet.  :-)

I'd still like to see a debugging version so that we can get a real stack 
trace, and/or try the latest 1.4.4 RC (posted yesterday).


On Sep 27, 2011, at 3:08 PM, German Hoecht wrote:


char* name[20]; yields 20 (undefined) pointers to char, guess you mean
char name[20];

So Brent's suggestion should work as well(?)

To be safe I would also add:
gethostname(name,maxlen);
name[19] = '\0';
printf("Hello, world.  I am %d of %d and host %s \n", rank, ...

Cheers

On 09/27/2011 07:40 PM, Phillip Vassenkov wrote:

Thanks, but my main concern is the segfault :P I changed and as I
expected it still segfaults.

On 9/27/11 9:48 AM, Henderson, Brent wrote:

Here is another possibly non-helpful suggestion.  :)  Change:

  char* name[20];
  int maxlen = 20;

To:

  char name[256];
  int maxlen = 256;

gethostname() is supposed to properly truncate the hostname it returns
if the actual name is longer than the length provided, but since you
have at least one that is longer than 20 characters, I'm curious.

Brent


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
On Behalf Of Jeff Squyres
Sent: Tuesday, September 27, 2011 6:29 AM
To: Open MPI Users
Subject: Re: [OMPI users] Segfault on any MPI communication on head node

Hmm.  It's not immediately clear to me what's going wrong here.

I hate to ask, but could you install a debugging version of Open MPI
and capture a proper stack trace of the segv?

Also, could you try the 1.4.4 rc and see if that magically fixes the
problem? (I'm about to post a new 1.4.4 rc later this morning, but
either the current one or the one from later today would be a good
datapoint)


On Sep 26, 2011, at 5:09 PM, Phillip Vassenkov wrote:


Yep, Fedora Core 14 and OpenMPI 1.4.3

On 9/24/11 7:02 AM, Jeff Squyres wrote:

Are you running the same OS version and Open MPI version between the
head node and regular nodes?

On Sep 23, 2011, at 5:27 PM, Vassenkov, Phillip wrote:


Hey all,
I've been racking my brains over this for several days and was
hoping anyone could enlighten me. I'll describe only the relevant
parts of the network/computer systems. There is one head node and a
multitude of regular nodes. The regular nodes are all identical to
each other. If I run an mpi program from one of the regular nodes
to any other regular nodes, everything works. If I include the head
node in the hosts file, I get segfaults which I'll paste below
along with sample code. The machines are all networked via
infiniband and Ethernet. The issue only arises when mpi
communication occurs. By this I mean, MPi_Init might succeed but
the segfault always occurs on MPI_Barrier or MPI_send/recv. I found
a work around by disabling the openib btl and enforcing that
communications go over infiniband(if I don't force infiniband,
it'll go over Ethernet). This command works when the head node is
included in the hosts file:
mpirun --hostfile hostfile --mca btl ^openib --mca
btl_tcp_if_include ib0  -np 2 ./b.out

Sample Code:
#include "mpi.h"
#include
int main(int argc, char *argv[])
{
int rank, nprocs;
 char* name[20];
 int maxlen = 20;
 MPI_Init(&argc,&argv);
 MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
 MPI_Comm_rank(MPI_COMM_WORLD,&rank);
 MPI_Barrier(MPI_COMM_WORLD);
 gethostname(name,maxlen);
 printf("Hello, world.  I am %d of %d and host %s \n", rank,
nprocs,name);
 fflush(stdout);
 MPI_Finalize();
 return 0;

}

Segfault:
[pastec:19917] *** Process received signal ***
[pastec:19917] Signal: Segmentation fault (11)
[pastec:19917] Signal code: Address not mapped (1)
[pastec:19917] Failing at address: 0x8
[pastec:19917] [ 0] /lib64/libpthread.so.0() [0x34a880eeb0]
[pastec:19917] [ 1] /usr/lib64/libmthca-rdmav2.so(+0x36aa)
[0x7eff6430b6aa]
[pastec:19917] [ 2]
/usr/lib64/openmpi/lib/openm

Re: [OMPI users] [SOLVED] unresolvable R_X86_64_64 relocation against symbol `mpi_fortran_*

2011-10-03 Thread Jeff Squyres
Wow -- painful!  Glad you figured it out; thanks for posting it back here to 
make it google-able.


On Oct 3, 2011, at 9:21 AM, Dmitry N. Mikushin wrote:

> Ok, here's the solution: remove --as-needed option out of compiler's
> internal linker invocation command line. Steps to do this:
> 
> 1) Dump compiler specs: $ gcc -dumpspecs > specs
> 2) Open specs file for edit and remove --as-needed from the line
> 
> *link:
> %{!r:--build-id} --no-add-needed --as-needed %{!static:--eh-frame-hdr}
> %{!m32:-m elf_x86_64} %{m32:-m elf_i386} --hash-style=gnu
> %{shared:-shared}   %{!shared: %{!static:
> %{rdynamic:-export-dynamic}   %{m32:-dynamic-linker
> %{muclibc:/lib/ld-uClibc.so.0;:%{mbionic:/system/bin/linker;:/lib/ld-linux.so.2}}}
>  %{!m32:-dynamic-linker
> %{muclibc:/lib/ld64-uClibc.so.0;:%{mbionic:/system/bin/linker64;:/lib64/ld-linux-x86-64.so.2
>%{static:-static}}
> 
> resulting into
> 
> *link:
> %{!r:--build-id} --no-add-needed %{!static:--eh-frame-hdr} %{!m32:-m
> elf_x86_64} %{m32:-m elf_i386} --hash-style=gnu   %{shared:-shared}
> %{!shared: %{!static:   %{rdynamic:-export-dynamic}
> %{m32:-dynamic-linker
> %{muclibc:/lib/ld-uClibc.so.0;:%{mbionic:/system/bin/linker;:/lib/ld-linux.so.2}}}
>  %{!m32:-dynamic-linker
> %{muclibc:/lib/ld64-uClibc.so.0;:%{mbionic:/system/bin/linker64;:/lib64/ld-linux-x86-64.so.2
>%{static:-static}}
> 
> 3) Save specs file into compiler's folder
> /usr/lib/gcc/// For example, in case of Ubuntu 10.10
> with gcc 4.6.1 it's /usr/lib/gcc/x86_64-linux-gnu/4.6.1/
> 
> With this change no unresolvable relocations anymore!
> 
> - D.
> 
> 2011/10/3 Dmitry N. Mikushin :
>> Hi,
>> 
>> Here's a reprocase, the same one as mentioned here:
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=608901
>> 
>> marcusmae@loveland:~/Programming/mpitest$ cat mpitest.f90
>> program main
>> include 'mpif.h'
>> integer ierr
>> call mpi_init(ierr)
>> end
>> 
>> marcusmae@loveland:~/Programming/mpitest$ mpif90 -g mpitest.f90
>> /usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x542): unresolvable
>> R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
>> /usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x55c): unresolvable
>> R_X86_64_64 relocation against symbol `mpi_fortran_argv_null_'
>> /usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x5d2): unresolvable
>> R_X86_64_64 relocation against symbol `mpi_fortran_errcodes_ignore_'
>> /usr/bin/ld: /tmp/cc3NLduM.o(.debug_info+0x5ec): unresolvable
>> R_X86_64_64 relocation against symbol `mpi_fortran_errcodes_ignore_'
>> 
>> Remove "-g", and the error will be gone.
>> 
>> marcusmae@loveland:~/Programming/mpitest$ mpif90 --showme -g mpitest.f90
>> gfortran -g mpitest.f90 -I/opt/openmpi_gcc-1.5.4/include -pthread
>> -I/opt/openmpi_gcc-1.5.4/lib -L/opt/openmpi_gcc-1.5.4/lib -lmpi_f90
>> -lmpi_f77 -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl
>> 
>> marcusmae@loveland:~/Programming/mpitest$ mpif90 -v
>> Using built-in specs.
>> COLLECT_GCC=/usr/bin/gfortran
>> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6.1/lto-wrapper
>> Target: x86_64-linux-gnu
>> Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
>> 4.6.1-9ubuntu3'
>> --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
>> --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
>> --program-suffix=-4.6 --enable-shared --enable-linker-build-id
>> --with-system-zlib --libexecdir=/usr/lib --without-included-gettext
>> --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6
>> --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
>> --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin
>> --enable-objc-gc --disable-werror --with-arch-32=i686
>> --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu
>> --host=x86_64-linux-gnu --target=x86_64-linux-gnu
>> Thread model: posix
>> gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)
>> 
>> 2011/9/28 Dmitry N. Mikushin :
>>> Hi,
>>> 
>>> Interestingly, the errors are gone after I removed "-g" from the app
>>> compile options.
>>> 
>>> I tested again on the fresh Ubuntu 11.10 install: both 1.4.3 and 1.5.4
>>> compile fine, but with the same error.
>>> Also I tried hard to find any 32-bit object or library and failed.
>>> They all are 64-bit.
>>> 
>>> - D.
>>> 
>>> 2011/9/24 Jeff Squyres :
 Check the output from when you ran Open MPI's configure and "make all" -- 
 did it decide to build the F77 interface?
 
 Also check that gcc and gfortran output .o files of the same bitness / 
 type.
 
 
 On Sep 24, 2011, at 8:07 AM, Dmitry N. Mikushin wrote:
 
> Compile and link - yes, but it turns out there was some unnoticed
> compilation error because
> 
> ./hellompi: error while loading shared libraries: libmpi_f77.so.1:
> cannot open shared object file: No such file or directory
> 
> and this library does not exist.
> 
> Hm.
> 
> 2011/9/24 Jeff Squyres :
>> Can you compil

Re: [OMPI users] Segfault on any MPI communication on head node

2011-10-03 Thread Ralph Castain
That means you have mismatched installations around - one configured as debug, 
and one not. They have to match.

Sent from my iPad

On Oct 3, 2011, at 2:44 PM, Phillip Vassenkov 
 wrote:

> I went into the directory that I used to install 1.4.3, did the following:
> make clean
> ./configure --enable-debug
> make -j8 all install
> 
> and it hangs at this when I try to run my code (I commented out all the host 
> name stuff, so it's just MPI code now)
> 
> [hostname:16574] [[17705,0],0] ORTE_ERROR_LOG: Buffer type (described vs 
> non-described) mismatch - operation not allowed in file 
> base/odls_base_default_fns.c at line 2600
> 
> I'm googling for more info but does anyone have any ideas?
> 
> On 9/28/11 8:30 PM, Jeff Squyres wrote:
>> Use --enable-debug on your configure line.  This will add in some debugging 
>> code to OMPI, and it'll compile everything with -g so that you can get stack 
>> traces.
>> 
>> Beware that the extra debugging junk makes OMPI slightly slower; don't do 
>> any benchmarking with this install, etc.
>> 
>> 
>> On Sep 28, 2011, at 6:27 PM, Phillip Vassenkov wrote:
>> 
>>> I tried 1.4.4rc4, same problem. Where do I get a debugging version?
>>> 
>>> On 9/28/11 8:32 AM, Jeff Squyres wrote:
 Agreed that the original program had the char*[20]/char[20] bug, but his 
 segv is occurring before trying to use that array.  So it's a bug - but he 
 just hadn't hit it yet.  :-)
 
 I'd still like to see a debugging version so that we can get a real stack 
 trace, and/or try the latest 1.4.4 RC (posted yesterday).
 
 
 On Sep 27, 2011, at 3:08 PM, German Hoecht wrote:
 
> char* name[20]; yields 20 (undefined) pointers to char, guess you mean
> char name[20];
> 
> So Brent's suggestion should work as well(?)
> 
> To be safe I would also add:
> gethostname(name,maxlen);
> name[19] = '\0';
> printf("Hello, world.  I am %d of %d and host %s \n", rank, ...
> 
> Cheers
> 
> On 09/27/2011 07:40 PM, Phillip Vassenkov wrote:
>> Thanks, but my main concern is the segfault :P I changed and as I
>> expected it still segfaults.
>> 
>> On 9/27/11 9:48 AM, Henderson, Brent wrote:
>>> Here is another possibly non-helpful suggestion.  :)  Change:
>>> 
>>>  char* name[20];
>>>  int maxlen = 20;
>>> 
>>> To:
>>> 
>>>  char name[256];
>>>  int maxlen = 256;
>>> 
>>> gethostname() is supposed to properly truncate the hostname it returns
>>> if the actual name is longer than the length provided, but since you
>>> have at least one that is longer than 20 characters, I'm curious.
>>> 
>>> Brent
>>> 
>>> 
>>> -Original Message-
>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>>> On Behalf Of Jeff Squyres
>>> Sent: Tuesday, September 27, 2011 6:29 AM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] Segfault on any MPI communication on head node
>>> 
>>> Hmm.  It's not immediately clear to me what's going wrong here.
>>> 
>>> I hate to ask, but could you install a debugging version of Open MPI
>>> and capture a proper stack trace of the segv?
>>> 
>>> Also, could you try the 1.4.4 rc and see if that magically fixes the
>>> problem? (I'm about to post a new 1.4.4 rc later this morning, but
>>> either the current one or the one from later today would be a good
>>> datapoint)
>>> 
>>> 
>>> On Sep 26, 2011, at 5:09 PM, Phillip Vassenkov wrote:
>>> 
 Yep, Fedora Core 14 and OpenMPI 1.4.3
 
 On 9/24/11 7:02 AM, Jeff Squyres wrote:
> Are you running the same OS version and Open MPI version between the
> head node and regular nodes?
> 
> On Sep 23, 2011, at 5:27 PM, Vassenkov, Phillip wrote:
> 
>> Hey all,
>> I've been racking my brains over this for several days and was
>> hoping anyone could enlighten me. I'll describe only the relevant
>> parts of the network/computer systems. There is one head node and a
>> multitude of regular nodes. The regular nodes are all identical to
>> each other. If I run an mpi program from one of the regular nodes
>> to any other regular nodes, everything works. If I include the head
>> node in the hosts file, I get segfaults which I'll paste below
>> along with sample code. The machines are all networked via
>> infiniband and Ethernet. The issue only arises when mpi
>> communication occurs. By this I mean, MPi_Init might succeed but
>> the segfault always occurs on MPI_Barrier or MPI_send/recv. I found
>> a work around by disabling the openib btl and enforcing that
>> communications go over infiniband(if I don't force infiniband,
>> it'll go over Ethernet). This command works when the head node is
>> incl