Ralph,
I created a hostfile that just has the names of the hosts while
specifying no slot information whatsoever (e.g. csclprd3-0-0)
and received the following errors:
mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/
--hostfile hostfile-noslots --mca btl_tcp_if_include eth0
Hello, Alina!
I use "OSU MPI Multiple Bandwidth / Message Rate Test v4.4.1".
I downloaded it from the website: http://mvapich.cse.ohio-state.edu/benchmarks/
I have attached "osu_mbw_mr.c" to this letter.
Best regards,
Timur
Четверг, 18 июня 2015, 18:23 +03:00 от Alina Sklarevich
:
>Hi Timur,
>
Ah crud - my bad for not looking closely enough at your original backtrace. I’m
so used to seeing these issues as having to do with binding when I see that
hwthreads-as-cpus flag :-)
This has nothing to do with binding etc - something appears wrong in the
ompi_free_list code. I’ll have to defer
Lane,
could you please describe your configuration ?
how many sockets per node ?
how many cores per socket ?
how many threads per core ?
what is the minimum number of nodes needed to reproduce the issue ?
do all the nodes have the same configuration ?
if yes, what happens without --hetero-nodes ?
Gilles
I was fooled too, but that isn’t the issue. The problem is that ompi_free_list
is segfaulting:
> [csclprd3-0-13:30901] *** Process received signal ***
> [csclprd3-0-13:30901] Signal: Bus error (7)
> [csclprd3-0-13:30901] Signal code: Non-existant physical address (2)
> [csclprd3-0-13:3090
Ralph,
I got that, but I cannot read the stack trace (optimized build)
my best bet is to reproduce the issue, and then find how and why
ompi_free_list_t is segfault'ing.
that's why I requested info about the environment
iirc, ompi_free_list_t are different between master and v1.8, so an
incorrect
Good point
William: can you rebuild OMPI with —enable-debug and run this again so we can
see where the code is breaking?
Thanks
Ralph
> On Jun 19, 2015, at 6:11 AM, Gilles Gouaillardet
> wrote:
>
> Ralph,
>
> I got that, but I cannot read the stack trace (optimized build)
> my best bet is