The LAMA mapper hasn't been maintained or tested in some time, and probably should be removed from the tarball at this point. However, you can use the other mappers to obtain pretty much any pattern. What are you trying to do?
On Sun, Aug 9, 2015 at 12:55 PM, Nils Smeds <nsm...@lenovo.com> wrote: > Hello, > > I am trying to experiment with MPI process binding and the LAMA package > looks very promising for my needs. However, I am having problems in > that I get what appears to be internal errors in LAMA. > > From what I can tell my command line options are accepted gracefully, > but then when they are expected to be executed the application seg > -faults. > > Has anyone seen similar issues? I have tried with Open-MPI 1.8.6 and > Open-MPI 1.8.8 with the same behaviour. I _think_ my command line > options are valid, and in any case I think it should generate an error > and not a crash if there were incorrect. > > Command line to be used on host 0 (host_225): > mpirun -np 32 -report-bindings -display-map -mca orte_base_help_aggregate > 0 \ > -x MXM_RDMA_PORTS=mlx5_0:1 -mca btl_openib_if_include mlx5_0:1 \ > --hostfile /home/lnsmeds/.lsbatch/1439148094.25601.hostfile \ > --prefix /hpc/base/mellanox/hpcx/exp/icc/ompi-mellanox-v1.8-local -x > LD_LIBRARY_PATH \ > --mca rmaps lama --mca rmaps_base_verbose 10 --mca rmaps_lama_map hcsnNb \ > --mca rmaps_lama_bind 1N --mca rmaps_lama_ordering s --mca > rmaps_lama_mppr 2:N,1:c \ > -x custom_mxm_tl=rc -x MXM_TLS=self,shm,rc -x MXM_RC_QP_LIMIT=-1 -x > custom_fca=off -mca coll_fca_enable 0 \ > -x custom_hcol=on -mca coll_hcoll_enable 1 -x > HCOLL_IB_IF_INCLUDE=mlx5_0:1 -x HCOLL_ENABLE_MCAST_ALL=1 \ > -x custom_mxm=on -mca pml yalla ./myapplication.exe 2>&1 > > Sun Aug 9 21:20:56 CEST 2015 > [host_225:09102] mca: base: components_register: registering rmaps > components > [host_225:09102] mca: base: components_register: found loaded component > lama > [host_225:09102] mca:rmaps:lama: Priority 0 > [host_225:09102] mca:rmaps:lama: Map : hcsnNb > [host_225:09102] mca:rmaps:lama: Bind : 1N > [host_225:09102] mca:rmaps:lama: MPPR : 2:N,1:c > [host_225:09102] mca:rmaps:lama: Order : s > [host_225:09102] mca: base: components_register: component lama register > function successful > [host_225:09102] [[18162,0],0] rmaps:base set policy with NULL > [host_225:09102] mca: base: components_open: opening rmaps components > [host_225:09102] mca: base: components_open: found loaded component lama > [host_225:09102] mca:rmaps:select: checking available component lama > [host_225:09102] mca:rmaps:select: Querying component [lama] > [host_225:09102] [[18162,0],0]: Final mapper priorities > [host_225:09102] Mapper: lama Priority: 0 > [host_225:09102] mca:rmaps: mapping job [18162,1] > [host_225:09102] mca:rmaps: creating new map for job [18162,1] > [host_225:09102] mca:rmaps: nprocs 32 > [host_225:09102] mca:rmaps[139] mapping not given - using bysocket > [host_225:09102] mca:rmaps:lama: Mapping job [18162,1] > [host_225:09102] mca:rmaps:lama: Revised Parameters ----- > [host_225:09102] mca:rmaps:lama: Map : hcsnNb > [host_225:09102] mca:rmaps:lama: Bind : 1N > [host_225:09102] mca:rmaps:lama: MPPR : 2:N,1:c > [host_225:09102] mca:rmaps:lama: Order : s > [host_225:09102] mca:rmaps:lama: --------------------------------- > [host_225:09102] mca:rmaps:lama: ----- Binding : [1N] > [host_225:09102] mca:rmaps:lama: ----- Binding : 1 x NUMA > [host_225:09102] mca:rmaps:lama: --------------------------------- > [host_225:09102] mca:rmaps:lama: ----- Mapping : [hcsnNb] > [host_225:09102] mca:rmaps:lama: ----- Mapping : (0) Hw. Thread (8 vs 0) > [host_225:09102] mca:rmaps:lama: ----- Mapping : (1) Core (7 vs 1) > [host_225:09102] mca:rmaps:lama: ----- Mapping : (2) Socket (3 vs 2) > [host_225:09102] mca:rmaps:lama: ----- Mapping : (3) Machine (0 vs 3) > [host_225:09102] mca:rmaps:lama: ----- Mapping : (4) NUMA (2 vs 7) > [host_225:09102] mca:rmaps:lama: ----- Mapping : (5) Board (1 vs 8) > [host_225:09102] mca:rmaps:lama: --------------------------------- > [host_225:09102] mca:rmaps:lama: ----- MPPR : [2:N,1:c] > [host_225:09102] mca:rmaps:lama: ----- MPPR : 2 at NUMA > [host_225:09102] mca:rmaps:lama: ----- MPPR : 1 at Core > [host_225:09102] mca:rmaps:lama: --------------------------------- > [host_225:09102] mca:rmaps:lama: ----- Ordering : [s] > [host_225:09102] mca:rmaps:lama: ----- Ordering : Sequential > [host_225:09102] mca:rmaps:lama: --------------------------------- > [host_225:09102] AVAILABLE NODES FOR MAPPING: > [host_225:09102] node: host_225 daemon: 0 > [host_225:09102] node: host_229 daemon: 1 > [host_225:09102] node: host_140 daemon: 2 > [host_225:09102] node: host_342 daemon: 3 > [host_225:09102] mca:rmaps:lama: --------------------------------- > [host_225:09102] mca:rmaps:lama: ----- Building the Max Tree... > [host_225:09102] mca:rmaps:lama: --------------------------------- > [host_225:09102] mca:rmaps:lama: ----- Converting Remote Tree: host_225 > [host_225:09102] mca:rmaps:lama: ----- Converting Remote Tree: host_229 > [host_225:09102] mca:rmaps:lama: ----- Converting Remote Tree: host_140 > [host_225:09102] mca:rmaps:lama: ----- Converting Remote Tree: host_342 > [host_225:09102] *** Process received signal *** > [host_225:09102] Signal: Segmentation fault (11) > [host_225:09102] Signal code: Address not mapped (1) > [host_225:09102] Failing at address: 0x60 > [host_225:09102] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2b0ad199f710] > [host_225:09102] [ 1] > /hpc/base/mellanox/hpcx/exp/icc/ompi-mellanox-v1.8-local/lib/openmpi/mca_rmaps_lama.so(+0x7366)[0x2b0ad625b366] > [host_225:09102] [ 2] > /hpc/base/mellanox/hpcx/exp/icc/ompi-mellanox-v1.8-local/lib/openmpi/mca_rmaps_lama.so(+0x73c9)[0x2b0ad625b3c9] > [host_225:09102] [ 3] > /hpc/base/mellanox/hpcx/exp/icc/ompi-mellanox-v1.8-local/lib/openmpi/mca_rmaps_lama.so(rmaps_lama_build_max_tree+0x63f)[0x2b0ad625a7cf] > [host_225:09102] [ 4] > /hpc/base/mellanox/hpcx/exp/icc/ompi-mellanox-v1.8-local/lib/openmpi/mca_rmaps_lama.so(+0x3573)[0x2b0ad6257573] > [host_225:09102] [ 5] > /hpc/base/mellanox/hpcx/exp/icc/ompi-mellanox-v1.8-local/lib/openmpi/mca_rmaps_lama.so(+0x2d7b)[0x2b0ad6256d7b] > [host_225:09102] [ 6] > /hpc/base/mellanox/hpcx/exp/icc/ompi-mellanox-v1.8-local/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x6ad)[0x2b0ad078db0d] > [host_225:09102] [ 7] > /hpc/base/mellanox/hpcx/exp/icc/ompi-mellanox-v1.8-local/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0xa19)[0x2b0ad0a3d489] > [host_225:09102] [ 8] mpirun[0x4064e9] > [host_225:09102] [ 9] mpirun[0x4040f4] > [host_225:09102] [10] > /lib64/libc.so.6(__libc_start_main+0xfd)[0x2b0ad1bccd1d] > [host_225:09102] [11] mpirun[0x404019] > [host_225:09102] *** End of error message *** > Sun Aug 9 21:20:56 CEST 2015 > > > Post mortem with gdb on core file gives: > > (gdb) info threads > 2 Thread 0x2b0ad4963700 (LWP 9104) 0x00002b0ad1c8f5e3 in select () from > /lib64/libc.so.6 > * 1 Thread 0x2b0ad3734680 (LWP 9102) opal_pointer_array_get_item > (node=0x7987e0, obj=0x7579f0) > at ../../../../../../openmpi-1.8.8/opal/class/opal_pointer_array.h:130 > (gdb) where > #0 opal_pointer_array_get_item (node=0x7987e0, obj=0x7579f0) > at ../../../../../../openmpi-1.8.8/opal/class/opal_pointer_array.h:130 > #1 rmaps_lama_annotate_node_for_mppr (node=0x7987e0, obj=0x7579f0) > at > ../../../../../../openmpi-1.8.8/orte/mca/rmaps/lama/rmaps_lama_max_tree.c:377 > #2 0x00002b0ad625b3c9 in rmaps_lama_annotate_node_for_mppr > (node=0x7987e0, obj=0x7579f0) > at > ../../../../../../openmpi-1.8.8/orte/mca/rmaps/lama/rmaps_lama_max_tree.c:396 > #3 0x00002b0ad625a7cf in rmaps_lama_build_max_tree (jdata=0x7987e0, > node_list=0x7579f0, max_tree=0x1, is_homogeneous=0x0) > at > ../../../../../../openmpi-1.8.8/orte/mca/rmaps/lama/rmaps_lama_max_tree.c:226 > #4 0x00002b0ad6257573 in orte_rmaps_lama_map_core (jdata=0x7987e0) > at > ../../../../../../openmpi-1.8.8/orte/mca/rmaps/lama/rmaps_lama_module.c:662 > #5 0x00002b0ad6256d7b in orte_rmaps_lama_map (jdata=0x7987e0) > at > ../../../../../../openmpi-1.8.8/orte/mca/rmaps/lama/rmaps_lama_module.c:308 > #6 0x00002b0ad078db0d in orte_rmaps_base_map_job (fd=7964640, args=31216, > cbdata=0x1) > at > ../../../../../openmpi-1.8.8/orte/mca/rmaps/base/rmaps_base_map_job.c:370 > #7 0x00002b0ad0a3d489 in event_process_active_single_queue > (base=0x7987e0, flags=7698928) > at > ../../../../../../../openmpi-1.8.8/opal/mca/event/libevent2021/libevent/event.c:1367 > #8 event_process_active (base=0x7987e0, flags=7698928) > at > ../../../../../../../openmpi-1.8.8/opal/mca/event/libevent2021/libevent/event.c:1437 > #9 opal_libevent2021_event_base_loop (base=0x7987e0, flags=7698928) > at > ../../../../../../../openmpi-1.8.8/opal/mca/event/libevent2021/libevent/event.c:1647 > #10 0x00000000004064e9 in orterun (argc=63, argv=0x7fffa5fc68b8) > at ../../../../../openmpi-1.8.8/orte/tools/orterun/orterun.c:1133 > #11 0x00000000004040f4 in main (argc=63, argv=0x7fffa5fc68b8) at > ../../../../../openmpi-1.8.8/orte/tools/orterun/main.c:13 > (gdb) thread 2 > [Switching to thread 2 (Thread 0x2b0ad4963700 (LWP 9104))]#0 > 0x00002b0ad1c8f5e3 in select () from /lib64/libc.so.6 > (gdb) where > #0 0x00002b0ad1c8f5e3 in select () from /lib64/libc.so.6 > #1 0x00002b0ad4555b12 in listen_thread (obj=0xd) > at > ../../../../../../openmpi-1.8.8/orte/mca/oob/tcp/oob_tcp_listener.c:685 > #2 0x00002b0ad19979d1 in start_thread () from /lib64/libpthread.so.0 > #3 0x00002b0ad1c96b6d in clone () from /lib64/libc.so.6 > > (gdb) thread 1 > (gdb) list 125,140 > 125 static inline void > *opal_pointer_array_get_item(opal_pointer_array_t *table, > 126 int element_index) > 127 { > 128 void *p; > 129 > 130 if( table->size <= element_index ) { > 131 return NULL; > 132 } > 133 OPAL_THREAD_LOCK(&(table->lock)); > 134 p = table->addr[element_index]; > 135 OPAL_THREAD_UNLOCK(&(table->lock)); > 136 return p; > 137 } > 138 > 139 > > (gdb) up > #1 rmaps_lama_annotate_node_for_mppr (node=0x7987e0, obj=0x7579f0) > at > ../../../../../../openmpi-1.8.8/orte/mca/rmaps/lama/rmaps_lama_max_tree.c:377 > 377 mppr_accounting = > (rmaps_lama_node_mppr_t*)opal_pointer_array_get_item(hwloc_userdata->node_mppr, > node->index); > (gdb) list 370,390 > 370 } > 371 } > 372 > 373 > 374 /* > 375 * Add node information if it is not already there > 376 */ > 377 mppr_accounting = > (rmaps_lama_node_mppr_t*)opal_pointer_array_get_item(hwloc_userdata->node_mppr, > node->index); > 378 if( NULL == mppr_accounting ) { > 379 /* > 380 * Add MPPR accounting for this node associated with this > object > 381 */ > 382 rmaps_lama_convert_hwloc_key_to_lama_key(obj->type, > obj->attr->cache.depth, &lama_key); > 383 > 384 mppr_accounting = > (rmaps_lama_node_mppr_t*)malloc(sizeof(rmaps_lama_node_mppr_t)); > 385 mppr_accounting->max = rmaps_lama_get_mppr_for_key(node, > lama_key); > 386 mppr_accounting->cur = 0; > 387 > 388 opal_pointer_array_set_item(hwloc_userdata->node_mppr, > node->index, mppr_accounting); > 389 } > 390 > > (gdb) p hwloc_userdata->node_mppr > $1 = (opal_pointer_array_t *) 0x8 > (gdb) p *hwloc_userdata->node_mppr > Cannot access memory at address 0x8 > (gdb) p node->index > $2 = 0 > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27414.php >