On Sep 29, 2014, at 12:05 PM, Amos Anderson <amos.ander...@protabit.com> wrote:

> Hi Dave --
> 
> It looks like my argv[argc] is not NULL (see below), so are we getting that 
> this problem is boost::python's fault?

Yep - they are violating the C99 standard


> 
> Thanks!
> Amos.
> 
> 
> 
> Looking in the boost code, I see this is how MPI_Init is called:
> 
> 
> environment::environment(int& argc, char** &argv, bool abort_on_exception)
>  : i_initialized(false),
>    abort_on_exception(abort_on_exception)
> {
>  if (!initialized()) {
>    BOOST_MPI_CHECK_RESULT(MPI_Init, (&argc, &argv));
>    i_initialized = true;
>  }
> 
>  MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
> }
> 
> 
> 
> 
> Getting some more info from a gdb session (the trace is the same):
> (gdb) up
> #1  0x00002aaaaab2ce4e in ompi_mpi_init (argc=2, argv=0xa39440, requested=0, 
> provided=0x7fffffffb9e8) at runtime/ompi_mpi_init.c:450
> 450           tmp = opal_argv_join(&argv[1], ' ');
> (gdb) up
> #2  0x00002aaaaab63e39 in PMPI_Init (argc=0x7fffffffbadc, 
> argv=0x7fffffffbad0) at pinit.c:84
> 84            err = ompi_mpi_init(*argc, *argv, required, &provided);
> (gdb) print argc
> $1 = (int *) 0x7fffffffbadc
> (gdb) print *argc
> $2 = 2
> (gdb) print *argv
> $3 = (char **) 0xa39440
> (gdb) print argv
> $4 = (char ***) 0x7fffffffbad0
> (gdb) print **argv
> $5 = 0x9d3230 "test/regression/regression-test.py"
> (gdb) up
> #3  0x00002aaab7b965d6 in boost::mpi::environment::environment 
> (this=0xa3a280, argc=@0x7fffffffbadc, argv=@0x7fffffffbad0, 
> abort_on_exception=true)
>    at ../tools/boost/libs/mpi/src/environment.cpp:98
> 98        BOOST_MPI_CHECK_RESULT(MPI_Init, (&argc, &argv));
> (gdb) print argc
> $6 = (int &) @0x7fffffffbadc: 2
> (gdb) print *argc
> Attempt to take contents of a non-pointer value.
> (gdb) print &argc
> $7 = (int *) 0x7fffffffbadc
> (gdb) print argc
> $8 = (int &) @0x7fffffffbadc: 2
> (gdb) print argv
> $9 = (char **&) @0x7fffffffbad0: 0xa39440
> (gdb) print *argv
> $10 = 0x9d3230 "test/regression/regression-test.py"
> (gdb) print argv[0]
> $11 = 0x9d3230 "test/regression/regression-test.py"
> (gdb) print argv[1]
> $12 = 0x9caa40 "test/regression/regression-jobs"
> (gdb) print argv[2]
> $13 = 0x20 <Address 0x20 out of bounds>
> (gdb) 
> 
> 
> 
> 
> On Sep 29, 2014, at 11:48 AM, Dave Goodell (dgoodell) <dgood...@cisco.com> 
> wrote:
> 
>> Looks like boost::mpi and/or your python "mpi" module might be creating a 
>> bogus argv array and passing it to OMPI's MPI_Init routine.  Note that argv 
>> is required by C99 to be terminated with a NULL pointer (that is, 
>> (argv[argc]==NULL) must hold).  See 
>> http://stackoverflow.com/a/3772826/158513.
>> 
>> -Dave
>> 
>> On Sep 29, 2014, at 1:34 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> Afraid I cannot replicate a problem with singleton behavior in the 1.8 
>>> series:
>>> 
>>> 11:31:52  /home/common/openmpi/v1.8/orte/test/mpi$ ./hello foo bar
>>> Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0-23
>>> OMPI_MCA_orte_default_hostfile=/home/common/hosts
>>> OMPI_COMMAND=./hello
>>> OMPI_ARGV=foo bar
>>> OMPI_NUM_APP_CTX=1
>>> OMPI_FIRST_RANKS=0
>>> OMPI_APP_CTX_NUM_PROCS=1
>>> OMPI_MCA_orte_ess_num_procs=1
>>> 
>>> You can see that the OMPI_ARGV envar (which is the spot you flagged) is 
>>> correctly being set and there is no segfault. Not sure what your program 
>>> may be doing, though, so I'm not sure I've really tested your scenario.
>>> 
>>> 
>>> On Sep 29, 2014, at 10:55 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>> Okay, so regression-test.py is calling MPI_Init as a singleton, correct? 
>>>> Just trying to fully understand the scenario
>>>> 
>>>> Singletons are certainly allowed, if that's the scenario
>>>> 
>>>> On Sep 29, 2014, at 10:51 AM, Amos Anderson <amos.ander...@protabit.com> 
>>>> wrote:
>>>> 
>>>>> I'm not calling mpirun in this case because this particular calculation 
>>>>> doesn't use more than one processor. What I'm doing on my command line is 
>>>>> this:
>>>>> 
>>>>> /home/user/myapp/tools/python/bin/python 
>>>>> test/regression/regression-test.py test/regression/regression-jobs
>>>>> 
>>>>> and internally I check for rank/size. This command is executed in the 
>>>>> context of a souped up LD_LIBRARY_PATH. You can see the variable argv in 
>>>>> opal_argv_join is ending up with the last argument on my command line.
>>>>> 
>>>>> I suppose your question implies that mpirun is mandatory for executing 
>>>>> anything compiled with OpenMPI > 1.6 ?
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sep 29, 2014, at 10:28 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> 
>>>>>> Can you pass us the actual mpirun command line being executed? 
>>>>>> Especially need to see the argv being passed to your application.
>>>>>> 
>>>>>> 
>>>>>> On Sep 27, 2014, at 7:09 PM, Amos Anderson <amos.ander...@protabit.com> 
>>>>>> wrote:
>>>>>> 
>>>>>>> FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. 
>>>>>>> Also, I have some gdb output (from 1.7.5) for your perusal, including a 
>>>>>>> printout of some of the variables' values.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Starting program: /home/user/myapp/tools/python/bin/python 
>>>>>>> test/regression/regression-test.py test/regression/regression-jobs
>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>> 
>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>> 0x00002aaaabc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
>>>>>>> argv.c:299
>>>>>>> 299         str_len += strlen(*p) + 1;
>>>>>>> (gdb) where
>>>>>>> #0  0x00002aaaabc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) 
>>>>>>> at argv.c:299
>>>>>>> #1  0x00002aaaaab2ce4e in ompi_mpi_init (argc=2, argv=0xa39390, 
>>>>>>> requested=0, provided=0x7fffffffba98) at runtime/ompi_mpi_init.c:450
>>>>>>> #2  0x00002aaaaab63e39 in PMPI_Init (argc=0x7fffffffbb8c, 
>>>>>>> argv=0x7fffffffbb80) at pinit.c:84
>>>>>>> #3  0x00002aaab7b965d6 in boost::mpi::environment::environment 
>>>>>>> (this=0xa3a1d0, argc=@0x7fffffffbb8c, argv=@0x7fffffffbb80, 
>>>>>>> abort_on_exception=true)
>>>>>>>  at ../tools/boost/libs/mpi/src/environment.cpp:98
>>>>>>> #4  0x00002aaabc7b311d in boost::mpi::python::mpi_init 
>>>>>>> (python_argv=..., abort_on_exception=true) at 
>>>>>>> ../tools/boost/libs/mpi/src/python/py_environment.cpp:60
>>>>>>> #5  0x00002aaabc7b33fb in boost::mpi::python::export_environment () at 
>>>>>>> ../tools/boost/libs/mpi/src/python/py_environment.cpp:94
>>>>>>> #6  0x00002aaabc7d5ab5 in boost::mpi::python::init_module_mpi () at 
>>>>>>> ../tools/boost/libs/mpi/src/python/module.cpp:44
>>>>>>> #7  0x00002aaab792a2f2 in 
>>>>>>> boost::detail::function::void_function_ref_invoker0<void (*)(), 
>>>>>>> void>::invoke (function_obj_ptr=...)
>>>>>>>  at ../tools/boost/boost/function/function_template.hpp:188
>>>>>>> #8  0x00002aaab7929e6b in boost::function0<void>::operator() 
>>>>>>> (this=0x7fffffffc110) at 
>>>>>>> ../tools/boost/boost/function/function_template.hpp:767
>>>>>>> #9  0x00002aaab7928f11 in boost::python::handle_exception_impl (f=...) 
>>>>>>> at ../tools/boost/libs/python/src/errors.cpp:25
>>>>>>> #10 0x00002aaab792a54f in boost::python::handle_exception<void (*)()> 
>>>>>>> (f=0x2aaabc7d5746 <boost::mpi::python::init_module_mpi()>) at 
>>>>>>> ../tools/boost/boost/python/errors.hpp:29
>>>>>>> #11 0x00002aaab792a1d9 in boost::python::detail::(anonymous 
>>>>>>> namespace)::init_module_in_scope (m=0x2aaabc617f68, 
>>>>>>>  init_function=0x2aaabc7d5746 <boost::mpi::python::init_module_mpi()>) 
>>>>>>> at ../tools/boost/libs/python/src/module.cpp:24
>>>>>>> #12 0x00002aaab792a26c in boost::python::detail::init_module 
>>>>>>> (name=0x2aaabc7f7f4d "mpi", init_function=0x2aaabc7d5746 
>>>>>>> <boost::mpi::python::init_module_mpi()>)
>>>>>>>  at ../tools/boost/libs/python/src/module.cpp:59
>>>>>>> #13 0x00002aaabc7d5b2b in boost::mpi::python::initmpi () at 
>>>>>>> ../tools/boost/libs/mpi/src/python/module.cpp:34
>>>>>>> #14 0x00002aaaab27e095 in _PyImport_LoadDynamicModule (name=0xac9435 
>>>>>>> "mpi", pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", fp=0xaca450) at 
>>>>>>> ./Python/importdl.c:53
>>>>>>> #15 0x00002aaaab279fd4 in load_module (name=0xac9435 "mpi", 
>>>>>>> fp=0xaca450, pathname=0xb08c60 "/home/user/myapp/lib/mpi.so", type=3, 
>>>>>>> loader=0x0) at Python/import.c:1915
>>>>>>> #16 0x00002aaaab27c2e8 in import_submodule (mod=0x2aaaab533a20, 
>>>>>>> subname=0xac9435 "mpi", fullname=0xac9435 "mpi") at Python/import.c:2700
>>>>>>> #17 0x00002aaaab27b8fa in load_next (mod=0x2aaab0f075a8, 
>>>>>>> altmod=0x2aaaab533a20, p_name=0x7fffffffc3f8, buf=0xac9430 "util.mpi", 
>>>>>>> p_buflen=0x7fffffffc408)
>>>>>>>  at Python/import.c:2519
>>>>>>> #18 0x00002aaaab27a98d in import_module_level (name=0x0, 
>>>>>>> globals=0xe95a70, locals=0xe95a70, fromlist=0x2aaaab533a20, level=-1) 
>>>>>>> at Python/import.c:2224
>>>>>>> #19 0x00002aaaab27aeda in PyImport_ImportModuleLevel 
>>>>>>> (name=0x2aaab0f00964 "mpi", globals=0xe95a70, locals=0xe95a70, 
>>>>>>> fromlist=0x2aaaab533a20, level=-1) at Python/import.c:2288
>>>>>>> #20 0x00002aaaab2419c4 in builtin___import__ (self=0x0, 
>>>>>>> args=0x2aaabc6211f8, kwds=0x0) at Python/bltinmodule.c:49
>>>>>>> #21 0x00002aaaab1b19c7 in PyCFunction_Call (func=0x2aaaabf85510, 
>>>>>>> arg=0x2aaabc6211f8, kw=0x0) at Objects/methodobject.c:85
>>>>>>> #22 0x00002aaaab14d673 in PyObject_Call (func=0x2aaaabf85510, 
>>>>>>> arg=0x2aaabc6211f8, kw=0x0) at Objects/abstract.c:2529
>>>>>>> #23 0x00002aaaab25ad03 in PyEval_CallObjectWithKeywords 
>>>>>>> (func=0x2aaaabf85510, arg=0x2aaabc6211f8, kw=0x0) at Python/ceval.c:3890
>>>>>>> #24 0x00002aaaab2543e5 in PyEval_EvalFrameEx (f=0xe8aef0, throwflag=0) 
>>>>>>> at Python/ceval.c:2333
>>>>>>> #25 0x00002aaaab258b7e in PyEval_EvalCodeEx (co=0x2aaabc61ce00, 
>>>>>>> globals=0xe95a70, locals=0xe95a70, args=0x0, argcount=0, kws=0x0, 
>>>>>>> kwcount=0, defs=0x0, defcount=0, 
>>>>>>>  closure=0x0) at Python/ceval.c:3253
>>>>>>> #26 0x00002aaaab24b5ce in PyEval_EvalCode (co=0x2aaabc61ce00, 
>>>>>>> globals=0xe95a70, locals=0xe95a70) at Python/ceval.c:667
>>>>>>> #27 0x00002aaaab2779e2 in PyImport_ExecCodeModuleEx (name=0xaa9080 
>>>>>>> "util.myappMPI", co=0x2aaabc61ce00, pathname=0xe7d380 
>>>>>>> "/home/user/myapp/src/util/myappMPI.pyc")
>>>>>>>  at Python/import.c:709
>>>>>>> #28 0x00002aaaab278629 in load_source_module (name=0xaa9080 
>>>>>>> "util.myappMPI", pathname=0xe7d380 
>>>>>>> "/home/user/myapp/src/util/myappMPI.pyc", fp=0x76eb00)
>>>>>>>  at Python/import.c:1099
>>>>>>> #29 0x00002aaaab279fa0 in load_module (name=0xaa9080 "util.myappMPI", 
>>>>>>> fp=0x76eb00, pathname=0x80fe00 "/home/user/myapp/src/util/myappMPI.py", 
>>>>>>> type=1, loader=0x0)
>>>>>>>  at Python/import.c:1906
>>>>>>> #30 0x00002aaaab27c2e8 in import_submodule (mod=0x2aaab0f075a8, 
>>>>>>> subname=0xaa9085 "myappMPI", fullname=0xaa9080 "util.myappMPI") at 
>>>>>>> Python/import.c:2700
>>>>>>> #31 0x00002aaaab27b860 in load_next (mod=0x2aaab0f075a8, 
>>>>>>> altmod=0x2aaaab533a20, p_name=0x7fffffffcd98, buf=0xaa9080 
>>>>>>> "util.myappMPI", p_buflen=0x7fffffffcda8)
>>>>>>>  at Python/import.c:2515
>>>>>>> #32 0x00002aaaab27a98d in import_module_level (name=0x0, 
>>>>>>> globals=0x7a3c70, locals=0x7a3c70, fromlist=0x2aaaaf6f53e0, level=-1) 
>>>>>>> at Python/import.c:2224
>>>>>>> #33 0x00002aaaab27aeda in PyImport_ImportModuleLevel 
>>>>>>> (name=0x2aaaaf6e8854 "myappMPI", globals=0x7a3c70, locals=0x7a3c70, 
>>>>>>> fromlist=0x2aaaaf6f53e0, level=-1)
>>>>>>>  at Python/import.c:2288
>>>>>>> #34 0x00002aaaab2419c4 in builtin___import__ (self=0x0, 
>>>>>>> args=0x2aaab0a9f7d0, kwds=0x0) at Python/bltinmodule.c:49
>>>>>>> #35 0x00002aaaab1b19c7 in PyCFunction_Call (func=0x2aaaabf85510, 
>>>>>>> arg=0x2aaab0a9f7d0, kw=0x0) at Objects/methodobject.c:85
>>>>>>> #36 0x00002aaaab14d673 in PyObject_Call (func=0x2aaaabf85510, 
>>>>>>> arg=0x2aaab0a9f7d0, kw=0x0) at Objects/abstract.c:2529
>>>>>>> #37 0x00002aaaab25ad03 in PyEval_CallObjectWithKeywords 
>>>>>>> (func=0x2aaaabf85510, arg=0x2aaab0a9f7d0, kw=0x0) at Python/ceval.c:3890
>>>>>>> #38 0x00002aaaab2543e5 in PyEval_EvalFrameEx (f=0x79d000, throwflag=0) 
>>>>>>> at Python/ceval.c:2333
>>>>>>> #39 0x00002aaaab258b7e in PyEval_EvalCodeEx (co=0x2aaab0253880, 
>>>>>>> globals=0x7a3c70, locals=0x7a3c70, args=0x0, argcount=0, kws=0x0, 
>>>>>>> kwcount=0, defs=0x0, defcount=0, 
>>>>>>>  closure=0x0) at Python/ceval.c:3253
>>>>>>> #40 0x00002aaaab24b5ce in PyEval_EvalCode (co=0x2aaab0253880, 
>>>>>>> globals=0x7a3c70, locals=0x7a3c70) at Python/ceval.c:667
>>>>>>> #41 0x00002aaaab2779e2 in PyImport_ExecCodeModuleEx (name=0x7153a0 
>>>>>>> "util", co=0x2aaab0253880, pathname=0x7e98a0 
>>>>>>> "/home/user/myapp/src/util/__init__.pyc")
>>>>>>>  at Python/import.c:709
>>>>>>> #42 0x00002aaaab278629 in load_source_module (name=0x7153a0 "util", 
>>>>>>> pathname=0x7e98a0 "/home/user/myapp/src/util/__init__.pyc", 
>>>>>>> fp=0x6fe020) at Python/import.c:1099
>>>>>>> #43 0x00002aaaab279fa0 in load_module (name=0x7153a0 "util", 
>>>>>>> fp=0x6fe020, pathname=0x755e40 "/home/user/myapp/src/util/__init__.py", 
>>>>>>> type=1, loader=0x0)
>>>>>>>  at Python/import.c:1906
>>>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>>>> #44 0x00002aaaab2788ef in load_package (name=0x7153a0 "util", 
>>>>>>> pathname=0x703390 "/home/user/myapp/src/util") at Python/import.c:1166
>>>>>>> #45 0x00002aaaab279fea in load_module (name=0x7153a0 "util", fp=0x0, 
>>>>>>> pathname=0x703390 "/home/user/myapp/src/util", type=5, loader=0x0) at 
>>>>>>> Python/import.c:1920
>>>>>>> #46 0x00002aaaab27c2e8 in import_submodule (mod=0x2aaaab533a20, 
>>>>>>> subname=0x7153a0 "util", fullname=0x7153a0 "util") at 
>>>>>>> Python/import.c:2700
>>>>>>> #47 0x00002aaaab27b860 in load_next (mod=0x2aaaab533a20, 
>>>>>>> altmod=0x2aaaab533a20, p_name=0x7fffffffd818, buf=0x7153a0 "util", 
>>>>>>> p_buflen=0x7fffffffd828) at Python/import.c:2515
>>>>>>> #48 0x00002aaaab27a98d in import_module_level (name=0x2aaaac0e0f59 
>>>>>>> "myappSubmission", globals=0x6443c0, locals=0x6443c0, 
>>>>>>> fromlist=0x2aaaac0f1760, level=-1)
>>>>>>>  at Python/import.c:2224
>>>>>>> #49 0x00002aaaab27aeda in PyImport_ImportModuleLevel 
>>>>>>> (name=0x2aaaac0e0f54 "util.myappSubmission", globals=0x6443c0, 
>>>>>>> locals=0x6443c0, fromlist=0x2aaaac0f1760, level=-1)
>>>>>>>  at Python/import.c:2288
>>>>>>> #50 0x00002aaaab2419c4 in builtin___import__ (self=0x0, 
>>>>>>> args=0x2aaaac0dbb00, kwds=0x0) at Python/bltinmodule.c:49
>>>>>>> #51 0x00002aaaab1b19c7 in PyCFunction_Call (func=0x2aaaabf85510, 
>>>>>>> arg=0x2aaaac0dbb00, kw=0x0) at Objects/methodobject.c:85
>>>>>>> #52 0x00002aaaab14d673 in PyObject_Call (func=0x2aaaabf85510, 
>>>>>>> arg=0x2aaaac0dbb00, kw=0x0) at Objects/abstract.c:2529
>>>>>>> #53 0x00002aaaab25ad03 in PyEval_CallObjectWithKeywords 
>>>>>>> (func=0x2aaaabf85510, arg=0x2aaaac0dbb00, kw=0x0) at Python/ceval.c:3890
>>>>>>> #54 0x00002aaaab2543e5 in PyEval_EvalFrameEx (f=0x71a1a0, throwflag=0) 
>>>>>>> at Python/ceval.c:2333
>>>>>>> #55 0x00002aaaab258b7e in PyEval_EvalCodeEx (co=0x2aaaac0dd720, 
>>>>>>> globals=0x6443c0, locals=0x6443c0, args=0x0, argcount=0, kws=0x0, 
>>>>>>> kwcount=0, defs=0x0, defcount=0, 
>>>>>>>  closure=0x0) at Python/ceval.c:3253
>>>>>>> #56 0x00002aaaab24b5ce in PyEval_EvalCode (co=0x2aaaac0dd720, 
>>>>>>> globals=0x6443c0, locals=0x6443c0) at Python/ceval.c:667
>>>>>>> #57 0x00002aaaab28d492 in run_mod (mod=0x720960, 
>>>>>>> filename=0x7fffffffe6bf "test/regression/regression-test.py", 
>>>>>>> globals=0x6443c0, locals=0x6443c0, flags=0x7fffffffe220, 
>>>>>>>  arena=0x6707d0) at Python/pythonrun.c:1370
>>>>>>> #58 0x00002aaaab28d41c in PyRun_FileExFlags (fp=0x6cb7c0, 
>>>>>>> filename=0x7fffffffe6bf "test/regression/regression-test.py", 
>>>>>>> start=257, globals=0x6443c0, locals=0x6443c0, 
>>>>>>>  closeit=1, flags=0x7fffffffe220) at Python/pythonrun.c:1356
>>>>>>> #59 0x00002aaaab28bbfe in PyRun_SimpleFileExFlags (fp=0x6cb7c0, 
>>>>>>> filename=0x7fffffffe6bf "test/regression/regression-test.py", 
>>>>>>> closeit=1, flags=0x7fffffffe220)
>>>>>>>  at Python/pythonrun.c:948
>>>>>>> #60 0x00002aaaab28b1be in PyRun_AnyFileExFlags (fp=0x6cb7c0, 
>>>>>>> filename=0x7fffffffe6bf "test/regression/regression-test.py", 
>>>>>>> closeit=1, flags=0x7fffffffe220)
>>>>>>>  at Python/pythonrun.c:752
>>>>>>> #61 0x00002aaaab2a7497 in Py_Main (argc=3, argv=0x7fffffffe3a8) at 
>>>>>>> Modules/main.c:640
>>>>>>> #62 0x00000000004006f3 in main (argc=3, argv=0x7fffffffe3a8) at 
>>>>>>> ./Modules/python.c:23
>>>>>>> (gdb) p argv
>>>>>>> $1 = (char **) 0xa39398
>>>>>>> (gdb) p *argv
>>>>>>> $2 = 0xa26390 "test/regression/regression-jobs"
>>>>>>> (gdb) p **argv
>>>>>>> $3 = 116 't'
>>>>>>> (gdb) p p
>>>>>>> $4 = (char **) 0xa393a0
>>>>>>> (gdb) p *p
>>>>>>> $5 = 0x20 <Address 0x20 out of bounds>
>>>>>>> (gdb) p str_len
>>>>>>> $6 = 32
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sep 26, 2014, at 5:19 PM, Amos Anderson <amos.ander...@protabit.com> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hello all --
>>>>>>>> 
>>>>>>>> I'm trying to get a working configuration for my application and I can 
>>>>>>>> get OpenMPI 1.6.5 to work, while OpenMPI 1.8.2 segfaults.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Here's how I compile OpenMPI:
>>>>>>>> 
>>>>>>>> OPENMPI = openmpi-1.8.2
>>>>>>>> FLAGS = --enable-static
>>>>>>>> cd $(OPENMPI) ; ./configure $(FLAGS) --with-tm=/opt/torque-2.5.9/ 
>>>>>>>> --prefix=$(CURDIR)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I'm able to compile openmpi successfully, and I use a bjam instruction 
>>>>>>>> like this to compile my program (which uses boost python boost_1_55_0):
>>>>>>>> using mpi : ../tools/openmpi/bin/mpic++ ;
>>>>>>>> 
>>>>>>>> and I run my program in a Torque pbs script like this:
>>>>>>>> 
>>>>>>>> /bin/rm -rf jobname.nodes
>>>>>>>> for i in `cat ${PBS_NODEFILE} | sort -u`
>>>>>>>> do
>>>>>>>>         echo $i slots \= `grep $i ${PBS_NODEFILE} | wc -l` >> 
>>>>>>>> jobname.nodes
>>>>>>>> done
>>>>>>>> /home/user/myapp/tools/openmpi/bin/mpirun -np 2 -hostfile 
>>>>>>>> jobname.nodes /home/user/myapp/myapp.exe
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> which also compiles just fine. But when I run my program I get the 
>>>>>>>> segfault I printed below. When I switch to:
>>>>>>>> OPENMPI = openmpi-1.6.5
>>>>>>>> 
>>>>>>>> then everything works as expected. (As a side question, do I need both 
>>>>>>>> -hostfile and --with-tm? I asked this question earlier today on this 
>>>>>>>> list). That is, I believe that I'm using the exact same setup in both 
>>>>>>>> cases, and 1.6.5 works while 1.8.2 fails. Any suggestions what I might 
>>>>>>>> be doing wrong?
>>>>>>>> 
>>>>>>>> I suppose if I have a working setup I can give up even if it's with an 
>>>>>>>> older version... but this could be evidence of something I'll have to 
>>>>>>>> confront eventually.
>>>>>>>> 
>>>>>>>> Thanks for any advice!
>>>>>>>> Amos.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [local:27921] *** Process received signal ***
>>>>>>>> [local:27921] Signal: Segmentation fault (11)
>>>>>>>> [local:27921] Signal code: Address not mapped (1)
>>>>>>>> [local:27921] Failing at address: 0x40
>>>>>>>> [local:27921] [ 0] /lib64/libpthread.so.0[0x322180e4c0]
>>>>>>>> [local:27921] [ 1] /lib64/libc.so.6(strlen+0x30)[0x3220c78d80]
>>>>>>>> [local:27921] [ 2] 
>>>>>>>> /home/user/myapp/tools/openmpi/lib/libopen-pal.so.6(opal_argv_join+0x95)[0x2b87f5c4e175]
>>>>>>>> [local:27921] [ 3] 
>>>>>>>> /home/user/myapp/tools/openmpi/lib/libmpi.so.1(ompi_mpi_init+0x82d)[0x2b87f3c9ec0d]
>>>>>>>> [local:27921] [ 4] 
>>>>>>>> /home/user/myapp/tools/openmpi/lib/libmpi.so.1(MPI_Init+0xf0)[0x2b87f3cbc310]
>>>>>>>> [local:27921] [ 5] 
>>>>>>>> /home/user/myapp/lib/libboost_mpi.so.1.55.0(_ZN5boost3mpi11environmentC1ERiRPPcb+0x36)[0x2b87f3795826]
>>>>>>>> [local:27921] [ 6] 
>>>>>>>> /home/user/myapp/lib/mpi.so(_ZN5boost3mpi6python8mpi_initENS_6python4listEb+0x314)[0x2b87f30bc7b4]
>>>>>>>> [local:27921] [ 7] 
>>>>>>>> /home/user/myapp/lib/mpi.so(_ZN5boost3mpi6python18export_environmentEv+0xcc6)[0x2b87f30bd5f6]
>>>>>>>> [local:27921] [ 8] 
>>>>>>>> /home/user/myapp/lib/mpi.so(_ZN5boost3mpi6python15init_module_mpiEv+0x547)[0x2b87f30d4967]
>>>>>>>> [local:27921] [ 9] 
>>>>>>>> /home/user/myapp/lib/libboost_python.so.1.55.0(_ZN5boost6python21handle_exception_implENS_9function0IvEE+0x530)[0x2b87f3558430]
>>>>>>>> [local:27921] [10] 
>>>>>>>> /home/user/myapp/lib/libboost_python.so.1.55.0(_ZN5boost6python16handle_exceptionIPFvvEEEbT_+0x38)[0x2b87f3559798]
>>>>>>>> [local:27921] [11] 
>>>>>>>> /home/user/myapp/lib/libboost_python.so.1.55.0(_ZN5boost6python6detail11init_moduleEPKcPFvvE+0x63)[0x2b87f3559463]
>>>>>>>> [local:27921] [12] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(_PyImport_LoadDynamicModule+0xc2)[0x2b87e8c79282]
>>>>>>>> [local:27921] [13] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0[0x2b87e8c771a9]
>>>>>>>> [local:27921] [14] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0[0x2b87e8c776c1]
>>>>>>>> [local:27921] [15] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyImport_ImportModuleLevel+0x1b7)[0x2b87e8c77977]
>>>>>>>> [local:27921] [16] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0[0x2b87e8c57bcd]
>>>>>>>> [local:27921] [17] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyObject_Call+0x68)[0x2b87e8bb7ae8]
>>>>>>>> [local:27921] [18] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x56)[0x2b87e8c58216]
>>>>>>>> [local:27921] [19] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x381c)[0x2b87e8c5c79c]
>>>>>>>> [local:27921] [20] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8c9)[0x2b87e8c60c89]
>>>>>>>> [local:27921] [21] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x2b87e8c60d02]
>>>>>>>> [local:27921] [22] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyImport_ExecCodeModuleEx+0xc2)[0x2b87e8c74432]
>>>>>>>> [local:27921] [23] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0[0x2b87e8c769f0]
>>>>>>>> [local:27921] [24] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0[0x2b87e8c771a9]
>>>>>>>> [local:27921] [25] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0[0x2b87e8c77642]
>>>>>>>> [local:27921] [26] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyImport_ImportModuleLevel+0x1b7)[0x2b87e8c77977]
>>>>>>>> [local:27921] [27] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0[0x2b87e8c57bcd]
>>>>>>>> [local:27921] [28] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyObject_Call+0x68)[0x2b87e8bb7ae8]
>>>>>>>> [local:27921] [29] 
>>>>>>>> /home/user/myapp/tools/python/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x56)[0x2b87e8c58216]
>>>>>>>> [local:27921] *** End of error message ***
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/09/25396.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/09/25401.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/09/25403.php
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/09/25405.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25406.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25407.php

Reply via email to