Re: [OMPI users] Oversubscription performance problem
Sorry for the delays in replying. The central problem is that Open MPI is much more aggressive about its message passing progress than LAM is -- it simply wasn't designed to share well as a mechanism to get as high performance as possible. mpi_yield_when_idle is most helpful only for certain transports that actively use our event engine, such as the TCP device. Since you're using the LAM sysv RPI, I assume you're using the TCP and shared memory devices in OMPI, right? If you're using infiniband, for example, the event engine is not called much because IB has its own progression engine that is unrelated to OMPI's (and therefore we don't invoke OMPI's much). mpi_yield_when_idle is also only helpful if you're going into the MPI layer often and making message passing progress (i.e., OMPI's event engine is actively being invoked). Is this true for your application? If mpi_yield_when_idle really doesn't help much, you may consider sprinkling calls to sched_yield() in your codes to force the process to yield the processor. On Apr 4, 2008, at 2:30 AM, Lars Andersson wrote: Hi, I'm just in the progress of moving our application from LAM/MPI to OpenMPI, mainly because OpenMPI makes it easier for a user to run multiple jobs(MPI universa) simultaneously. This is useful if a user wants to run smaller experiments without disturbing a large experiment running in the background). I've been evaluation the performance using a simple test, running on a hetrogenous cluster of 2 x dual core Opteron machines, a couple of dual core P4 Xeon machines and a 8 core Core2 machine. The main structure of the application is a master rank distributing jobs packages to the rest of the ranks and collecting the results. We don't use any fancy MPI features but rather see it as an efficient low-level tool for broadcasting and transferring data. When a single user runs a job (fully subscribed nodes, but not oversubscribed, i.e one process per cpu-core) on an otherwise unloaded cluster both LAM/MPI and OpenMPI average runtimes of about 1m33s (OpenMPI has a slightly lower average). When I start the same job simultaneously as two different users (thus oversubscribing the nodes 2x) under LAM/MPI, the two jobs finish as an average time of about 3m, thus scaling very well (we use the -ssi rpi sysv option to mpirun under LAM/MPI to avoid busy waiting). When running the same second experiment under OpenMPI, the average runtime jumps up to about 3m30s, with runs occasionally taking more than 4 minutes to complete. I do use the "--mca mpi_yield_when_idle 1" option to mpirun, but it doesn't seem to make any difference. I've also tried setting the environment variable OMPI_MCA_mpi_yield_when_idle=1, but still no change. ompi_info says: ompi_info --param all all | grep yield MCA mpi: parameter "mpi_yield_when_idle" (current value: "1") The cluster is used for various tasks, running MPI applications as well as non-MPI applications, so we would like to avoid spending too much cycles on busy-waiting. Any ideas on how to tweak OpenMPI to get better performance and more cooperative behavior in this case would be greatly appreciated. Cheers, Lars ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Problems using Intel MKL with OpenMPI and Pathscale
Do you get the same error if you disable the memory handling in Open MPI? You can configure OMPI with: --disable-memory-manager On Apr 9, 2008, at 3:01 PM, Åke Sandgren wrote: Hi! I have an annoying problem that i hope someone here has some info on. I'm trying to build a code with OpenMPI+Intel MKL+Pathscale. When using the sequential (non-threaded) MKL everything is ok, but when using the threaded MKL i get a segfault. This doesn't happen when using MVAPICH so i suspect the memory handling inside OpenMPI. Version used are: OpenMPI 1.2.6 Pathscale 3.2beta MKL 10.0.2.018 Has anyone seen anything like this? -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Troubles with MPI-IO Test and Torque/PVFS
It looks like you're seg faulting when calling some flavor printf (perhaps vsnprintf?) in the make_error_messages() function. You might want to double check the read_write_file() function to see exactly what kind of error it is encountering such that it is calling report_errs(). On Apr 10, 2008, at 3:29 PM, Davi Vercillo C. Garcia wrote: Hi all, I have a Cluster with Torque and PVFS. I'm trying to test my environment with MPI-IO Test but some segfault are occurring. Does anyone know what is happening ? The error output is below: Rank 1 Host campogrande03.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad bytes at file offset 0. Expected (null), received (null) Rank 2 Host campogrande02.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad bytes at file offset 0. Expected (null), received (null) [campogrande01:10646] *** Process received signal *** Rank 0 Host campogrande04.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad bytes at file offset 0. Expected (null), received (null) Rank 0 Host campogrande04.dcc.ufrj.br WARNING ERROR 1207853304: 65537 bad bytes at file offset 0. Expected (null), received (null) [campogrande04:05192] *** Process received signal *** [campogrande04:05192] Signal: Segmentation fault (11) [campogrande04:05192] Signal code: Address not mapped (1) [campogrande04:05192] Failing at address: 0x1 Rank 1 Host campogrande03.dcc.ufrj.br WARNING ERROR 1207853304: 65537 bad bytes at file offset 0. Expected (null), received (null) [campogrande03:05377] *** Process received signal *** [campogrande03:05377] Signal: Segmentation fault (11) [campogrande03:05377] Signal code: Address not mapped (1) [campogrande03:05377] Failing at address: 0x1 [campogrande03:05377] [ 0] [0xe440] [campogrande03:05377] [ 1] /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4] [campogrande03:05377] [ 2] mpiIO_test(make_error_messages+0xcf) [0x80502e4] [campogrande03:05377] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569] [campogrande03:05377] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413] [campogrande03:05377] [ 5] mpiIO_test(read_write_file+0x594) [0x804d9c2] [campogrande03:05377] [ 6] mpiIO_test(main+0x1d0) [0x804aa14] [campogrande03:05377] [ 7] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050] [campogrande03:05377] [ 8] mpiIO_test [0x804a7e1] [campogrande03:05377] *** End of error message *** Rank 2 Host campogrande02.dcc.ufrj.br WARNING ERROR 1207853304: 65537 bad bytes at file offset 0. Expected (null), received (null) [campogrande02:05187] *** Process received signal *** [campogrande02:05187] Signal: Segmentation fault (11) [campogrande02:05187] Signal code: Address not mapped (1) [campogrande02:05187] Failing at address: 0x1 [campogrande01:10646] Signal: Segmentation fault (11) [campogrande01:10646] Signal code: Address not mapped (1) [campogrande01:10646] Failing at address: 0x1a [campogrande02:05187] [ 0] [0xe440] [campogrande02:05187] [ 1] /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4] [campogrande02:05187] [ 2] mpiIO_test(make_error_messages+0xcf) [0x80502e4] [campogrande02:05187] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569] [campogrande02:05187] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413] [campogrande02:05187] [ 5] mpiIO_test(read_write_file+0x594) [0x804d9c2] [campogrande02:05187] [ 6] mpiIO_test(main+0x1d0) [0x804aa14] [campogrande02:05187] [ 7] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050] [campogrande02:05187] [ 8] mpiIO_test [0x804a7e1] [campogrande02:05187] *** End of error message *** [campogrande04:05192] [ 0] [0xe440] [campogrande04:05192] [ 1] /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4] [campogrande04:05192] [ 2] mpiIO_test(make_error_messages+0xcf) [0x80502e4] [campogrande04:05192] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569] [campogrande04:05192] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413] [campogrande04:05192] [ 5] mpiIO_test(read_write_file+0x594) [0x804d9c2] [campogrande04:05192] [ 6] mpiIO_test(main+0x1d0) [0x804aa14] [campogrande04:05192] [ 7] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050] [campogrande04:05192] [ 8] mpiIO_test [0x804a7e1] [campogrande04:05192] *** End of error message *** [campogrande01:10646] [ 0] [0xe440] [campogrande01:10646] [ 1] /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4] [campogrande01:10646] [ 2] mpiIO_test(make_error_messages+0xcf) [0x80502e4] [campogrande01:10646] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569] [campogrande01:10646] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413] [campogrande01:10646] [ 5] mpiIO_test(read_write_file+0x594) [0x804d9c2] [campogrande01:10646] [ 6] mpiIO_test(main+0x1d0) [0x804aa14] [campogrande01:10646] [ 7] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050] [campogrande01:10646] [ 8] mpiIO_test [0x804a7e1] [campogrande01:10646] *** End of error message *** mpiexec noticed that job rank 0 with PID 5192 on node campogrande04 exited on signal 11 (Segmentation fault). -- Dav
Re: [OMPI users] Problems using Intel MKL with OpenMPI and Pathscale
On Sun, 2008-04-13 at 08:00 -0400, Jeff Squyres wrote: > Do you get the same error if you disable the memory handling in Open > MPI? You can configure OMPI with: > > --disable-memory-manager Ah, I have apparently missed that config flag, will try on monday. -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
Re: [OMPI users] problems with hostfile when doing MPMD
Hi Jody Simple answer - the 1.2.x series does not support multiple hostfiles. I believe you will find that documented in the FAQ section. What you have to do here is have -one- hostfile that includes all the hosts, and then -host each app-context to indicate which of those hosts are to be used for that specific app-context. Or you can just use -host for each app_context, with no hostfile specified. If you specify a hostfile and -host, then -all- the nodes listed in your -host's must be in the hostfile or we will error out. This will change in 1.3 where we will support a separate hostfile for each app_context, as well as one for the entire job, in combination with -host args as well. You can see that documented on the open-mpi wiki. Hope that helps Ralph On 4/10/08 4:40 AM, "jody" wrote: > HI > In my network i have some 32 bit machines and some 64 bit machines. > With --host i successfully call my application: > mpirun -np 3 --host aim-plankton -x DISPLAY ./run_gdb.sh ./MPITest : > -np 3 --host aim-fanta4 -x DISPLAY ./run_gdb.sh ./MPITest64 > (MPITest64 has the same code as MPITest, but was compiled on the 64 > bit machine) > > But when i use hostfiles: > mpirun -np 3 --hostfile hosts32 -x DISPLAY ./run_gdb.sh ./MPITest : > -np 3 --hostfile hosts64 -x DISPLAY ./run_gdb.sh ./MPITest64 > all 6 processes are started on the 64 bit machine aim-fanta4. > > hosts32: >aim-plankton slots=3 > hosts64 > aim-fanta4 slots > > Is this a bug or a feature? ;) > > Jody > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] problems with hostfile when doing MPMD
I believe this -should- work, but can't verify it myself. The most important thing is to be sure you built with --enable-heterogeneous or else it will definitely fail. Ralph On 4/10/08 7:17 AM, "Rolf Vandevaart" wrote: > > On a CentOS Linux box, I see the following: > >> grep 113 /usr/include/asm-i386/errno.h > #define EHOSTUNREACH 113 /* No route to host */ > > I have also seen folks do this to figure out the errno. > >> perl -e 'die$!=113' > No route to host at -e line 1. > > I am not sure why this is happening, but you could also check the Open > MPI User's Mailing List Archives where there are other examples of > people running into this error. A search of "113" had a few hits. > > http://www.open-mpi.org/community/lists/users > > Also, I assume you would see this problem with or without the > MPI_Barrier if you add this parameter to your mpirun line: > > --mca mpi_preconnect_all 1 > > The MPI_Barrier is causing the bad behavior because by default > connections are setup up lazily. Therefore only when the MPI_Barrier > call is made and we start communicating and establishing connections do > we start seeing the communication problems. > > Rolf > > jody wrote: >> Rolf, >> I was able to run hostname on the two noes that way, >> and also a simplified version of my testprogram (without a barrier) >> works. Only MPI_Barrier shows bad behaviour. >> >> Do you know what this message means? >> [aim-plankton][0,1,2][btl_tcp_endpoint.c: >> 572:mca_btl_tcp_endpoint_complete_connect] >> connect() failed with errno=113 >> Does it give an idea what could be the problem? >> >> Jody >> >> On Thu, Apr 10, 2008 at 2:20 PM, Rolf Vandevaart >> wrote: >>> This worked for me although I am not sure how extensive our 32/64 >>> interoperability support is. I tested on Solaris using the TCP >>> interconnect and a 1.2.5 version of Open MPI. Also, we configure >>> with >>> the --enable-heterogeneous flag which may make a difference here. >>> Also >>> this did not work for me over the sm btl. >>> >>> By the way, can you run a simple /bin/hostname across the two nodes? >>> >>> >>> burl-ct-v20z-4 61 =>/opt/SUNWhpc/HPC7.1/bin/mpicc -m32 simple.c -o >>> simple.32 >>> burl-ct-v20z-4 62 =>/opt/SUNWhpc/HPC7.1/bin/mpicc -m64 simple.c -o >>> simple.64 >>> burl-ct-v20z-4 63 =>/opt/SUNWhpc/HPC7.1/bin/mpirun -gmca >>> btl_tcp_if_include bge1 -gmca btl sm,self,tcp -host burl-ct-v20z-4 - >>> np 3 >>> simple.32 : -host burl-ct-v20z-5 -np 3 simple.64 >>> [burl-ct-v20z-4]I am #0/6 before the barrier >>> [burl-ct-v20z-5]I am #3/6 before the barrier >>> [burl-ct-v20z-5]I am #4/6 before the barrier >>> [burl-ct-v20z-4]I am #1/6 before the barrier >>> [burl-ct-v20z-4]I am #2/6 before the barrier >>> [burl-ct-v20z-5]I am #5/6 before the barrier >>> [burl-ct-v20z-5]I am #3/6 after the barrier >>> [burl-ct-v20z-4]I am #1/6 after the barrier >>> [burl-ct-v20z-5]I am #5/6 after the barrier >>> [burl-ct-v20z-5]I am #4/6 after the barrier >>> [burl-ct-v20z-4]I am #2/6 after the barrier >>> [burl-ct-v20z-4]I am #0/6 after the barrier >>> burl-ct-v20z-4 64 =>/opt/SUNWhpc/HPC7.1/bin/mpirun -V mpirun (Open >>> MPI) 1.2.5r16572 >>> >>> Report bugs to http://www.open-mpi.org/community/help/ >>> burl-ct-v20z-4 65 => >>> >>> >>> >>> >>> jody wrote: i narrowed it down: The majority of processes get stuck in MPI_Barrier. My Test application looks like this: #include #include #include "mpi.h" int main(int iArgC, char *apArgV[]) { int iResult = 0; int iRank1; int iNum1; char sName[256]; gethostname(sName, 255); MPI_Init(&iArgC, &apArgV); MPI_Comm_rank(MPI_COMM_WORLD, &iRank1); MPI_Comm_size(MPI_COMM_WORLD, &iNum1); printf("[%s]I am #%d/%d before the barrier\n", sName, iRank1, iNum1); MPI_Barrier(MPI_COMM_WORLD); printf("[%s]I am #%d/%d after the barrier\n", sName, iRank1, iNum1); MPI_Finalize(); return iResult; } If i make this call: mpirun -np 3 --debug-daemons --host aim-plankton -x DISPLAY ./run_gdb.sh ./MPITest32 : -np 3 --host aim-fanta4 -x DISPLAY ./run_gdb.sh ./MPITest64 (run_gdb.sh is a script which starts gdb in a xterm for each process) Process 0 (on aim-plankton) passes the barrier and gets stuck in PMPI_Finalize, all other processes get stuck in PMPI_Barrier, Process 1 (on aim-plankton) displays the message [aim-plankton][0,1,1][btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113 Process 2 on (aim-plankton) displays the same message twice. Any ideas? Thanks Jody On Thu, Apr 10, 2008 at 1:05 PM, jody wrote: > Hi > Using a more realistic application than a simple "Hello, world" > even the --host version doesn't work correctly > Called this