Did this issue get resolved? You might also want to look at our FAQ category for large clusters:
http://www.open-mpi.org/faq/?category=large-clusters On Jun 22, 2011, at 9:43 AM, Thorsten Schuett wrote: > Thanks for the tip. I can't tell yet whether it helped or not. However, with > your settings I get the following warning: > WARNING: Open MPI will create a shared memory backing file in a > directory that appears to be mounted on a network filesystem. > > I repeated the run with my settings and I noticed that on at least one node > my > app didn't came up. I can see an orted daemon on this node, but no other > process. And this was 30 minutes after the app started. > > orted -mca ess tm -mca orte_ess_jobid 125894656 -mca orte_ess_vpid 63 -mc > a orte_ess_num_procs 255 --hnp-uri ... > > Thorsten > > On Wednesday, June 22, 2011, Gilbert Grosdidier wrote: >> Bonjour Thorsten, >> >> I'm not surprised about the cluster type, indeed, >> but I do not remember getting such specific hang up you mention. >> >> Anyway, I suspect SGI Altix is a little bit special for OpenMPI, >> and I usually run with the following setup: >> - there is need to create for each job a specific tmp area, >> like "/scratch/ggg/uuu/run/tmp/pbs.${PBS_JOBID}" >> - then use something like that: >> >> setenv TMPDIR "/scratch/ggg/uuu/run/tmp/pbs.${PBS_JOBID}" >> setenv OMPI_PREFIX_ENV "/scratch/ggg/uuu/run/tmp/pbs.${PBS_JOBID}" >> setenv OMPI_MCA_mpi_leave_pinned_pipeline 1 >> >> - then, for running, many of these -mca options are probably useless >> with your app, >> while many of them may show to be useful. Your own way ... >> >> mpiexec -mca coll_tuned_use_dynamic_rules 1 -hostfile $PBS_NODEFILE - >> mca rmaps seq -mca btl_openib_rdma_pipeline_send_length 65536 -mca >> btl_openib_rdma_pipeline_frag_size 65536 -mca >> btl_openib_min_rdma_pipeline_size 65536 -mca >> btl_self_rdma_pipeline_send_length 262144 -mca >> btl_self_rdma_pipeline_frag_size 262144 -mca plm_rsh_num_concurrent >> 4096 -mca mpi_paffinity_alone 1 -mca mpi_leave_pinned_pipeline 1 -mca >> btl_sm_max_send_size 128 -mca >> coll_tuned_pre_allocate_memory_comm_size_limit 1048576 -mca >> btl_openib_cq_size 128 -mca btl_ofud_rd_num 128 -mca >> mpi_preconnect_mpi 0 -mca mpool_sm_min_size 131072 -mca btl >> sm,openib,self -mca btl_openib_want_fork_support 0 -mca >> opal_set_max_sys_limits 1 -mca osc_pt2pt_no_locks 1 -mca >> osc_rdma_no_locks 1 YOUR_APP >> >> (Watch the step : only one line only ...) >> >> This should be suitable for up to 8k cores. >> >> >> HTH, Best, G. >> >> Le 22 juin 11 à 09:13, Thorsten Schuett a écrit : >>> Sure. It's an SGI ICE cluster with dual-rail IB. The HCAs are Mellanox >>> ConnectX IB DDR. >>> >>> This is a 2040 cores job. I use 255 nodes with one MPI task on each >>> node and >>> use 8-way OpenMP. >>> >>> I don't need -np and -machinefile, because mpiexec picks up this >>> information >>> from PBS. >>> >>> Thorsten >>> >>> On Tuesday, June 21, 2011, Gilbert Grosdidier wrote: >>>> Bonjour Thorsten, >>>> >>>> Could you please be a little bit more specific about the cluster >>>> >>>> itself ? >>>> >>>> G. >>>> >>>> Le 21 juin 11 à 17:46, Thorsten Schuett a écrit : >>>>> Hi, >>>>> >>>>> I am running openmpi 1.5.3 on a IB cluster and I have problems >>>>> starting jobs >>>>> on larger node counts. With small numbers of tasks, it usually >>>>> works. But now >>>>> the startup failed three times in a row using 255 nodes. I am using >>>>> 255 nodes >>>>> with one MPI task per node and the mpiexec looks as follows: >>>>> >>>>> mpiexec --mca btl self,openib --mca mpi_leave_pinned 0 ./a.out >>>>> >>>>> After ten minutes, I pulled a stracktrace on all nodes and killed >>>>> the job, >>>>> because there was no progress. In the following, you will find the >>>>> stack trace >>>>> generated with gdb thread apply all bt. The backtrace looks >>>>> basically the same >>>>> on all nodes. It seems to hang in mpi_init. >>>>> >>>>> Any help is appreciated, >>>>> >>>>> Thorsten >>>>> >>>>> Thread 3 (Thread 46914544122176 (LWP 28979)): >>>>> #0 0x00002b6ee912d9a2 in select () from /lib64/libc.so.6 >>>>> #1 0x00002b6eeabd928d in service_thread_start (context=<value >>>>> optimized out>) >>>>> at btl_openib_fd.c:427 >>>>> #2 0x00002b6ee835e143 in start_thread () from /lib64/ >>>>> libpthread.so.0 >>>>> #3 0x00002b6ee9133b8d in clone () from /lib64/libc.so.6 >>>>> #4 0x0000000000000000 in ?? () >>>>> >>>>> Thread 2 (Thread 46916594338112 (LWP 28980)): >>>>> #0 0x00002b6ee912b8b6 in poll () from /lib64/libc.so.6 >>>>> #1 0x00002b6eeabd7b8a in btl_openib_async_thread (async=<value >>>>> optimized >>>>> out>) at btl_openib_async.c:419 >>>>> #2 0x00002b6ee835e143 in start_thread () from /lib64/ >>>>> libpthread.so.0 >>>>> #3 0x00002b6ee9133b8d in clone () from /lib64/libc.so.6 >>>>> #4 0x0000000000000000 in ?? () >>>>> >>>>> Thread 1 (Thread 47755361533088 (LWP 28978)): >>>>> #0 0x00002b6ee9133fa8 in epoll_wait () from /lib64/libc.so.6 >>>>> #1 0x00002b6ee87745db in epoll_dispatch (base=0xb79050, >>>>> arg=0xb558c0, >>>>> tv=<value optimized out>) at epoll.c:215 >>>>> #2 0x00002b6ee8773309 in opal_event_base_loop (base=0xb79050, >>>>> flags=<value >>>>> optimized out>) at event.c:838 >>>>> #3 0x00002b6ee875ee92 in opal_progress () at runtime/ >>>>> opal_progress.c:189 >>>>> #4 0x0000000039f00001 in ?? () >>>>> #5 0x00002b6ee87979c9 in std::ios_base::Init::~Init () at >>>>> ../../.././libstdc++-v3/src/ios_init.cc:123 >>>>> #6 0x00007fffc32c8cc8 in ?? () >>>>> #7 0x00002b6ee9d20955 in orte_grpcomm_bad_get_proc_attr >>>>> (proc=<value >>>>> optimized out>, attribute_name=0x2b6ee88e5780 " \020322351n+", >>>>> val=0x2b6ee875ee92, size=0x7fffc32c8cd0) at grpcomm_bad_module.c:500 >>>>> #8 0x00002b6ee86dd511 in ompi_modex_recv_key_value (key=<value >>>>> optimized >>>>> out>, source_proc=<value optimized out>, value=0xbb3a00, dtype=14 >>>>> '\016') at >>>>> runtime/ompi_module_exchange.c:125 >>>>> #9 0x00002b6ee86d7ea1 in ompi_proc_set_arch () at proc/proc.c:154 >>>>> #10 0x00002b6ee86db1b0 in ompi_mpi_init (argc=15, >>>>> argv=0x7fffc32c92f8, >>>>> requested=<value optimized out>, provided=0x7fffc32c917c) at >>>>> runtime/ompi_mpi_init.c:699 >>>>> #11 0x00007fffc32c8e88 in ?? () >>>>> #12 0x00002b6ee77f8348 in ?? () >>>>> #13 0x00007fffc32c8e60 in ?? () >>>>> #14 0x00007fffc32c8e20 in ?? () >>>>> #15 0x0000000009efa994 in ?? () >>>>> #16 0x0000000000000000 in ?? () >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> -- >>>> *---------------------------------------------------------------------* >>>> >>>> Gilbert Grosdidier gilbert.grosdid...@in2p3.fr >>>> LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 >>>> Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 >>>> B.P. 34, F-91898 Orsay Cedex (FRANCE) >>>> >>>> *---------------------------------------------------------------------* >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> -- >> *---------------------------------------------------------------------* >> Gilbert Grosdidier gilbert.grosdid...@in2p3.fr >> LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 >> Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 >> B.P. 34, F-91898 Orsay Cedex (FRANCE) >> *---------------------------------------------------------------------* > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/