Folks, For our code, we have a communication layer that abstracts the code that does the actual transfer of data. We call these "transports", and we link them as shared libraries. We have created an MPI transport that compiles/links against OpenMPI 2.0.1 using the compiler wrappers. When I compile OpenMPI with the--disable-dlopen option (thus cramming all of OpenMPI's plugins into the MPI library directly), things work great with our transport shared library. But when I have a "normal" OpenMPI (without --disable-dlopen) and create the same transport shared library, things fail. Upon launch, it appears that OpenMPI is unable to find the appropriate plugins:
[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: *mca_patcher_base_patch_t_class* (ignored) [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open mca_shmem_mmap: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_mmap.so: undefined symbol: *opal_show_help* (ignored) [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open mca_shmem_posix: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_posix.so: undefined symbol: *opal_show_help* (ignored) [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to open mca_shmem_sysv: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_sysv.so: undefined symbol: *opal_show_help* (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_init failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: ompi_rte_init failed --> Returned "Error" (-1) instead of "Success" (0) If I skip our shared libraries and instead write a standard MPI-based "hello, world" program that links against MPI directly (without --disable-dlopen), everything is again fine. It seems that having the double dlopen is causing problems for OpenMPI finding its own shared libraries. Note: I do have LD_LIBRARY_PATH pointing to …"openmpi-2.0.1/lib", as well as OPAL_PREFIX pointing to …"openmpi-2.0.1". Any thoughts about how I can try to tease out what's going wrong here? -Sean -- Sean Ahern Computational Engineering International 919-363-0883
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users