[OMPI users] Hang in mca_btl_vader_component_progress ()

2017-01-12 Thread Joshua Wall
Hello Users

I'm by no means an MPI expert, but I have successfully being using my
own compiled version of OMPI 1.10.2 for some time without issue. Lately
however I'm seeing a strange issue, which is that when I try to run on
more than 3 or 4 nodes I get a hang during setup. My code (the Fortran MHD
code FLASH ver
4.2.2) is attempting to call MPI_COMM_SPLIT:


   !! first make a communicator for group of processors
   !! that have the whole computational grid
   !! The grid is duplicated on all communicators
   countInComm=dr_globalNumProcs/dr_meshCopyCount

   if((countInComm*dr_meshCopyCount) /= dr_globalNumProcs)&
call Driver_abortFlash("when duplicating mesh, numProcs should
be a multiple of meshCopyCount")

   color = dr_globalMe/countInComm
   key = mod(dr_globalMe,countInComm)
   call MPI_Comm_split(dr_globalComm,color,key,dr_meshComm,error)
   call MPI_Comm_split(dr_globalComm,key,color,dr_meshAcrossComm,error)

   call MPI_COMM_RANK(dr_meshComm,dr_meshMe, error)
   call MPI_COMM_SIZE(dr_meshComm, dr_meshNumProcs,error)

   call MPI_COMM_RANK(dr_meshAcrossComm,dr_meshAcrossMe, error)
   call MPI_COMM_SIZE(dr_meshAcrossComm, dr_meshAcrossNumProcs,error)


and is hanging in split call. Attaching a GDB to the process on the
local node I find (CentOS is way behind on updating GDB so there aren't
a lot of symbols unfortunately):


(gdb) bt full
#0  0x2aaab150facd in mca_btl_vader_component_progress () from
/home/draco/jwall/local_openmpi/lib/openmpi/mca_btl_vader.so
No symbol table info available.
#1  0x2d348e6a in opal_progress () from
/home/draco/jwall/local_openmpi/lib/libopen-pal.so.13
 _mm_free_fn = 0
 event_debug_map_PRIMES = {53, 97, 193, 389, 769, 1543, 3079,
6151, 12289, 24593, 49157, 98317, 196613, 393241, 786433, 1572869,
3145739, 6291469, 12582917, 25165843, 50331653,
   100663319, 201326611, 402653189, 805306457, 1610612741}
 _event_debug_map_lock = 0x3e9cc70
 _mm_realloc_fn = 0
 event_debug_mode_too_late = 1
 global_debug_map = {hth_table = 0x0, hth_table_length = 0,
hth_n_entries = 0, hth_load_limit = 0, hth_prime_idx = -1}
 warn_once = 0
 use_monotonic = 1
 eventops = {0x2d5ee860, 0x2d5ee8c0, 0x2d5ee900, 0x0}
 _mm_malloc_fn = 0
 event_global_current_base_ = 0x0
 opal_libevent2021__event_debug_mode_on = 0
#2  0x2b635305 in ompi_request_default_wait_all () from
/home/draco/jwall/local_openmpi/lib/libmpi.so.12
No symbol table info available.
#3  0x2aaab1f60417 in ompi_coll_tuned_sendrecv_nonzero_actual ()
from /home/draco/jwall/local_openmpi/lib/openmpi/mca_coll_tuned.so
No symbol table info available.
#4  0x2aaab1f68074 in ompi_coll_tuned_allgather_intra_bruck () from
/home/draco/jwall/local_openmpi/lib/openmpi/mca_coll_tuned.so
No symbol table info available.
#5  0x2b621e4d in ompi_comm_split () from
/home/draco/jwall/local_openmpi/lib/libmpi.so.12
No symbol table info available.
#6  0x2b64f16d in PMPI_Comm_split () from
/home/draco/jwall/local_openmpi/lib/libmpi.so.12
No symbol table info available.
#7  0x2b3db70f in pmpi_comm_split__ () from
/home/draco/jwall/local_openmpi/lib/libmpi_mpifh.so.12
No symbol table info available.
#8  0x005e43ed in driver_setupparallelenv_ ()
No symbol table info available.
#9  0x005e1187 in driver_initflash_ ()
No symbol table info available.
#10 0x004451ce in __flash_run_MOD_initialize_code ()
No symbol table info available.
#11 0x004124e9 in handle_call.1908 ()
No symbol table info available.
#12 0x00422791 in run_loop_mpi.1914 ()
No symbol table info available.
#13 0x004169db in MAIN__ ()
No symbol table info available.
#14 0x00423c6f in main ()
No symbol table info available.
#15 0x2c507d1d in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#16 0x00405509 in _start ()
No symbol table info available.

Anyone have any ideas what the issue might be?

Thanks so much.

Joshua Wall
Ph. D. Candidate
Physics Department
Drexel University
-- 
Joshua Wall
Doctoral Candidate
Department of Physics
Drexel University
3141 Chestnut Street
Philadelphia, PA 19104
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Failed Flash run on Pleiades with OpenMPI 1.10.2

2016-03-10 Thread Joshua Wall
-mca mpi_warn_on_fork 0 --mca mpi_cuda_support 0 --mca btl
self,sm,openib --mca oob_tcp_if_include ib0 -hostfile local_host.txt
./flash4

#--mca oob_tcp_if_include ib0 # suggested in an OpenMPI forum for Pleiades
running

# It is a good practice to write stderr and stdout to a file (ex: output)
# Otherwise, they will be written to the PBS stderr and stdout in
/PBS/spool,
# which has limited amount  of space. When /PBS/spool is filled up, any job
# that tries to write to /PBS/spool will die.

# -end of script-

Hopefully this is enough information for someone to find an error in how I
did things. I also have the outputs of the make, make-test and make-install
if anyone would like to see those. :)

Thanks for the help!

Cordially,

Joshua Wall


[OMPI users] Typo in mpi-fort-wrapper-data.txt?

2018-01-30 Thread Joshua Wall
Hello users,

I was installing a new OS this week (Xubuntu 17.10 to be exact) and
pulled down the latest OMPI from apt on the machine. While trying to
compile a MPI Fortran program I noticed the following:

josh@josh-UX490UA:/usr/share/openmpi$ mpifort --showme
gfortran -I/usr/lib/x86_64-linux-gnu/openmpi/include -pthread
-I/usr/lib/x86_64-linux-gnu/openmpi/lib *-L/usr//lib*
-L/usr/lib/x86_64-linux-gnu/openmpi/lib -lmpi_usempif08
-lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi

Noticing the double //,  I checked the file and saw it there also:

josh@josh-UX490UA:/usr/share/openmpi$ sudo vim mpifort-wrapper-data.txt
# There can be multiple blocks of configuration data, chosen by
# compiler flags (using the compiler_args key to chose which block
# should be activated.  This can be useful for multilib builds.  See the
# multilib page at:
#https://github.com/open-mpi/ompi/wiki/compilerwrapper3264
# for more information.

project=Open MPI
project_short=OMPI
version=2.1.1
language=Fortran
compiler_env=FC
compiler_flags_env=FCFLAGS
compiler=gfortran
preprocessor_flags=
compiler_flags=-pthread  -I${libdir}
linker_flags=*-L/usr//lib*
# Note that per https://svn.open-mpi.org/trac/ompi/ticket/3422, we
# intentionally only link in the MPI libraries (ORTE, OPAL, etc. are
# pulled in implicitly) because we intend MPI applications to only use
# the MPI API.
libs=-lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
libs_static=-lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
-lopen-rte -lopen-pal  -lhwloc -ldl -lutil -lm
dyn_lib_file=libmpi.so
static_lib_file=libmpi.a
required_file=
includedir=${includedir}
libdir=${libdir}

I'm guessing this is unintentional, but wanted to check since its in the
distro before I edit it on my end.

Thanks,

Josh
-- 
Joshua Wall
Doctoral Candidate
Department of Physics
Drexel University
3141 Chestnut Street
Philadelphia, PA 19104
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Typo in mpi-fort-wrapper-data.txt?

2018-01-31 Thread Joshua Wall
Hello Jeff,

   Just to be clear, I installed it with (nothing fancy here... I've
experienced installing OMPI from source before, but its not worth it on
this small laptop!):

sudo apt install libopenmpi-dev

Cordially,

Josh
-- 
Joshua Wall
Doctoral Candidate
Department of Physics
Drexel University
3141 Chestnut Street
Philadelphia, PA 19104
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users