On Thu, 2010-04-22 at 08:09 +1000, Nev wrote:
> O
> n Tue, 2010-04-20 at 20:22 -0400, Jeff Squyres wrote:
> > On Apr 20, 2010, at 6:16 PM, Nev wrote:
> > 
> > > Hi Jeff,
> > > I did the install to the "same place". I always use /opt/openmpi, the
> > > procedure I use when building is
> > > configure --prefix=/opt/openmpi ...
> > > rm -r /opt/openmpi/*
> > > make clean
> > > make all
> > > make install
> > > is this sufficient to un-install previous version, or is more required.
> > 
> > Yes, that should be sufficient.  Is that what you did this time?  
> > 
> > If so, is there any way you can provide a small code example of the problem 
> > you're seeing?
> > 
> OK, I will attempt to reduce to minimal code set, but will not be able
> to do so until the week end.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Hi Jeff,
Hopefully I have include sufficient information for you to identify what
I am doing incorrectly.

Have created minimalist set of code which was built, linked and run
against version 1.2.7 shared, version 1.4.1 shared and version 1.4.1
static.

But have not been able to get the same error message, as reported
earlier.

v1.4.1 static WORKS with no error or warning messages.
v1.4.1 shared FAILS with message
   "mpirun noticed that process rank 0 with PID 31115 on node dingo3
exited on signal 11 (Segmentation fault)".
v1.2.7 shared WORKS but with message:
   "[dingo3:31123] mca: base: component_find: unable to open osc pt2pt:
file not found (ignored)"

I have also run the above 3 configuration with actual comms between the
processes and that works except for 1.4.1 shared.

1.4.1 shared always fails in the call MPI_Init(...)

To run command I used
/opt/openmpi/bin/mpirun -np 2 -mca btl tcp,self \
-x LD_LIBRARY_PATH=/opt/openmpi/lib:/work/lib \
-x PATH=/opt/openmpi/bin:/work/bin:/usr/bin \
-host dingo3 a3exec
setting the LD_LIBRARY_PATH and PATH are not my normal habit, but used
to minimise any external dependencies.

This test machine is a newly installed (eg very clean) Ubuntu 9.10 64
desktop with server kernel. It is a dual socket 8 core hyperthreaded
intel box. It has installed
a. openssh + freenx
b. KVM
c. build-essential
d. 32 bit libraries
e. bridge-utils
f. uml-utilities

openmpi was built with 
./configure prefix=/opt/openmpi CFLAGS=-m32 CXXFLAGS=-m32
plus --enable-static --disable-shared for static builds

I have also tested on 32 bit Ubuntu 9.10 and 8.04 (not clean) with the
same results.

Minimist files
init.c build as "liba1lib.so" using mpicc

#include "mpi.h"
#include "stdio.h"

static int mpiRank = -1;
static int mpiSize = -1;

int connect(int * const pArgc, char * * pArgv[])
{
        printf("ENTER connect *pArgc=%d, *pArgv[0]=%s\n", *pArgc, (*pArgv)[0]);
fflush(0);
        MPI_Init(pArgc, pArgv);
        // <<<<NEVER>>>> get to here for version 1.4.1 shared build
        printf("DONE MPI_init\n"); fflush(0); 

        MPI_Comm_rank(MPI_COMM_WORLD, &mpiRank);
        MPI_Comm_size(MPI_COMM_WORLD, &mpiSize);
        printf("MPI_rank = %d, MPI_size = %d\n", mpiRank, mpiSize); fflush(0);

        MPI_Finalize();
        printf("%d EXITING connect\n", mpiRank); fflush(0);
        return 0;
}


load.cpp build as "liba2lib.so" using g++

extern "C" {
  #include <stdio.h>
  #include <stdlib.h>
  #include <dlfcn.h>
  typedef void (*tConnect)(int * pArgc, char * * pArgv[]);
  void load(int * pArgc, char * * pArgv[]);
}

void load(int * pArgc, char * * pArgv[])
{
  printf("ENTER load\n"); fflush(0);

  dlerror();
  char const * const libName = "liba1lib.so";
  void * const result = dlopen(libName, RTLD_LAZY | RTLD_LOCAL);
  if (result == 0)
  {
    fprintf(stderr, "Failed to load library %s error = %s\n", 
                    libName, dlerror()); fflush(0);
    exit(1);
  }
  char const * const symbolName = "connect";
  void * symbol = dlsym(result, symbolName);
  if (symbol == 0)
  {
    fprintf(stderr, "Failed to load symbol %s from %s error = %s\n",
        symbolName, libName, dlerror()); fflush(0);
   exit(1);
  }
  ((tConnect)symbol)(pArgc, pArgv);
  printf("DONE load\n"); fflush(0);
  return;
}

main.cpp built as "a3exec" using g++

extern "C" {
  #include <stdio.h>
  void load(int * pArgc, char * * pArgv[]);
}

int main(int argc, char * argv[])
{
  printf("ENTER main\n");
  load(&argc, &argv);
  printf("EXIT main\n");
  return 0;
}

Thanks Nev




Reply via email to