On Thu, 2010-04-22 at 08:09 +1000, Nev wrote: > O > n Tue, 2010-04-20 at 20:22 -0400, Jeff Squyres wrote: > > On Apr 20, 2010, at 6:16 PM, Nev wrote: > > > > > Hi Jeff, > > > I did the install to the "same place". I always use /opt/openmpi, the > > > procedure I use when building is > > > configure --prefix=/opt/openmpi ... > > > rm -r /opt/openmpi/* > > > make clean > > > make all > > > make install > > > is this sufficient to un-install previous version, or is more required. > > > > Yes, that should be sufficient. Is that what you did this time? > > > > If so, is there any way you can provide a small code example of the problem > > you're seeing? > > > OK, I will attempt to reduce to minimal code set, but will not be able > to do so until the week end. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Hi Jeff, Hopefully I have include sufficient information for you to identify what I am doing incorrectly. Have created minimalist set of code which was built, linked and run against version 1.2.7 shared, version 1.4.1 shared and version 1.4.1 static. But have not been able to get the same error message, as reported earlier. v1.4.1 static WORKS with no error or warning messages. v1.4.1 shared FAILS with message "mpirun noticed that process rank 0 with PID 31115 on node dingo3 exited on signal 11 (Segmentation fault)". v1.2.7 shared WORKS but with message: "[dingo3:31123] mca: base: component_find: unable to open osc pt2pt: file not found (ignored)" I have also run the above 3 configuration with actual comms between the processes and that works except for 1.4.1 shared. 1.4.1 shared always fails in the call MPI_Init(...) To run command I used /opt/openmpi/bin/mpirun -np 2 -mca btl tcp,self \ -x LD_LIBRARY_PATH=/opt/openmpi/lib:/work/lib \ -x PATH=/opt/openmpi/bin:/work/bin:/usr/bin \ -host dingo3 a3exec setting the LD_LIBRARY_PATH and PATH are not my normal habit, but used to minimise any external dependencies. This test machine is a newly installed (eg very clean) Ubuntu 9.10 64 desktop with server kernel. It is a dual socket 8 core hyperthreaded intel box. It has installed a. openssh + freenx b. KVM c. build-essential d. 32 bit libraries e. bridge-utils f. uml-utilities openmpi was built with ./configure prefix=/opt/openmpi CFLAGS=-m32 CXXFLAGS=-m32 plus --enable-static --disable-shared for static builds I have also tested on 32 bit Ubuntu 9.10 and 8.04 (not clean) with the same results. Minimist files init.c build as "liba1lib.so" using mpicc #include "mpi.h" #include "stdio.h" static int mpiRank = -1; static int mpiSize = -1; int connect(int * const pArgc, char * * pArgv[]) { printf("ENTER connect *pArgc=%d, *pArgv[0]=%s\n", *pArgc, (*pArgv)[0]); fflush(0); MPI_Init(pArgc, pArgv); // <<<<NEVER>>>> get to here for version 1.4.1 shared build printf("DONE MPI_init\n"); fflush(0); MPI_Comm_rank(MPI_COMM_WORLD, &mpiRank); MPI_Comm_size(MPI_COMM_WORLD, &mpiSize); printf("MPI_rank = %d, MPI_size = %d\n", mpiRank, mpiSize); fflush(0); MPI_Finalize(); printf("%d EXITING connect\n", mpiRank); fflush(0); return 0; } load.cpp build as "liba2lib.so" using g++ extern "C" { #include <stdio.h> #include <stdlib.h> #include <dlfcn.h> typedef void (*tConnect)(int * pArgc, char * * pArgv[]); void load(int * pArgc, char * * pArgv[]); } void load(int * pArgc, char * * pArgv[]) { printf("ENTER load\n"); fflush(0); dlerror(); char const * const libName = "liba1lib.so"; void * const result = dlopen(libName, RTLD_LAZY | RTLD_LOCAL); if (result == 0) { fprintf(stderr, "Failed to load library %s error = %s\n", libName, dlerror()); fflush(0); exit(1); } char const * const symbolName = "connect"; void * symbol = dlsym(result, symbolName); if (symbol == 0) { fprintf(stderr, "Failed to load symbol %s from %s error = %s\n", symbolName, libName, dlerror()); fflush(0); exit(1); } ((tConnect)symbol)(pArgc, pArgv); printf("DONE load\n"); fflush(0); return; } main.cpp built as "a3exec" using g++ extern "C" { #include <stdio.h> void load(int * pArgc, char * * pArgv[]); } int main(int argc, char * argv[]) { printf("ENTER main\n"); load(&argc, &argv); printf("EXIT main\n"); return 0; } Thanks Nev