Re: [OMPI users] Program hangs
Hello, I also experience a similar problem with the MUMPS solver, when I run it on a cluster. After several hours of running the code does not produce any results, although the command top shows that the program occupies 100% of the CPU. The difference here, however, is that the same program runs fine on my PC. The differences between my PC and the cluster are: 1) 32bit vs 64-bit(cluster) 2) intel compiler vs portland compiler(cluster) Any thoughts on what might cause this? Thank you, Vasilis On Friday 20 November 2009 03:50:17 am Jiaye Li wrote: > Hello > > I installed openmpi-1.3.3 on my single node(cpu) intel 64bit quad-core > machine. The compiler info is: > > > *** > *** intel-icc101018-10.1.018-1.i386 > libgcc-4.4.0-4.i586 > gcc-4.4.0-4.i586 > gcc-gfortran-4.4.0-4.i586 > gcc-c++-4.4.0-4.i586 > intel-ifort101018-10.1.018-1.i386 > > and the architecture is: > > processor: 0 > vendor_id: GenuineIntel > cpu family: 6 > model: 23 > model name: Intel(R) Core(TM)2 Quad CPUQ9550 @ 2.83GHz > stepping: 10 > cpu MHz: 2825.937 > cache size: 6144 KB > physical id: 0 > siblings: 4 > core id: 0 > cpu cores: 4 > apicid: 0 > initial apicid: 0 > fdiv_bug: no > hlt_bug: no > f00f_bug: no > coma_bug: no > fpu: yes > fpu_exception: yes > cpuid level: 13 > wp: yes > flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm > constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est > tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority > bogomips: 5651.87 > clflush size: 64 > power management: > > *** > *** > > I compiled PWscf program with openmpi and tested the program. At the > beginning, the execution of PW went on well, but after about 10 h, when > the program is going to finish. The program hang there, but the cpu time > is still occupied. (100% taken up by the program). There seems to be > something wrong, somewhere. Any ideas? Thank you in advance. > > This is the config.log of Ompi: > > *** > *** This file contains any messages > produced by compilers while > running configure, to aid debugging if configure makes a mistake. > > It was created by Open MPI configure 1.3.3, which was > generated by GNU Autoconf 2.63. Invocation command line was > > $ ./configure --prefix=/opt/openmpi-1.3.3 --disable-static CC=gcc > FC=ifort F77=ifort --enable-shared > > ## - ## > ## Platform. ## > ## - ## > > hostname = localhost > uname -m = i686 > uname -r = 2.6.29.4-167.fc11.i686.PAE > uname -s = Linux > uname -v = #1 SMP Wed May 27 17:28:22 EDT 2009 > > /usr/bin/uname -p = unknown > /bin/uname -X = unknown > > /bin/arch = i686 > /usr/bin/arch -k = unknown > /usr/convex/getsysinfo = unknown > /usr/bin/hostinfo = unknown > /bin/machine = unknown > /usr/bin/oslevel = unknown > /bin/universe = unknown > > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all > PATH: /home/jy/.wine/drive_c/windows > PATH: /home/jy/Download/XCrySDen-1.5.21-src > PATH: /home/jy/bin/vtstscripts > PATH: /opt/mpich2-1.2/bin > PATH: /opt/intel/fc/10.1.018/bin > PATH: /opt/intel/cc/10.1.018/bin > PATH: /usr/lib/qt-3.3/bin > PATH: /usr/kerberos/bin > PATH: /usr/lib/ccache > PATH: /usr/local/bin > PATH: /usr/bin > PATH: /bin > PATH: /usr/local/sbin > PATH: /usr/sbin > PATH: /sbin > PATH: /home/jy/Download/XCrySDen-1.5.21-src/scripts > PATH: /home/jy/Download/XCrySDen-1.5.21-src/util > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/util > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/util > PATH: /home/jy/bin > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/util > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/util > > > ## --- ## > ## Core tests. ## > ## --- ## > > configure:3424: checking for a BSD-compatible install > configure:3492: result: /usr/bin/install -c > configure:3503: checking whether build environment is sane > configure:3546: result: yes > configure:3571: checking for a thread-safe mkdir -p > configure:3610: result: /bin/mkdir -p > configure:3623: checking for gawk > configure:3639: found /usr/bi
Re: [OMPI users] nonblocking MPI_File_iwrite() does block?
On Mon, Nov 16, 2009 at 11:20:44AM +0100, Christoph Rackwitz wrote: > It's been ten days now. I'd like to resurrect this, in case someone > can help and just missed it. Hi. I only check in on the OpenMPI list periodically. Sorry for the delay. The standard in no way requires any overlap for either the nonblocking communication or I/O routines. There are long and heated discussions about "strict" or "weak" interpretation of the progress rule and which one is "better". If you want asynchronous nonblocking I/O, you might have to roll all the way back to LAM or MPICH-1.2.7, when ROMIO used its own request objects and test/wait routines on top of the aio routines. In order to have standard request objects and use the standard test/wait routines, ROMIO switched to generalized requests. However, it's difficult to make progress on generalized requests without using threads, so we do all the work when the job is posted and as you observe, MPI_Wait() discovers immediately that the job is complete. I proposed an extension to MPI generalized requests a few years ago that would make them more amenable to libraries like ROMIO. Today systems have a ton of cores. Spawning an I/O thread is not such an onerous burden. But we don't spawn such a thread in ROMIO, and so nonblocking I/O is not asynchronous. What if you moved your MPI_File_write call into a thread? There are several ways to do this: you could use standard generalized reqeusts and make progress with a thread -- the application writer has a lot more knowledge about the systems and how best to allocate threads. If I may ask a slightly different question: you've got periods of I/O and periods of computation. Have you evaluated collective I/O? I know you are eager to hide I/O in the background -- to get it for free -- but there's no such thing as a free lunch. Background I/O might still perturb your computation phase, unless you make zero MPI calls in your computational phase. Collective I/O can bring some fairly powerful optimizations to the table and reduce your overall I/O costs, perhaps even reducing them enough that you no longer miss true asynchronous I/O ? ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA
[OMPI users] Fwd: Call for participation: MPI Forum User Survey
The MPI Forum announced at its SC09 BOF that they are soliciting community feedback to help guide the MPI-3 standards process. A survey is available online at the following URL: http://mpi-forum.questionpro.com/ Password: mpi3 In this survey, the MPI Forum is asking as many people as possible for feedback on the MPI-3 process -- what features to include, what features to not include, etc. We encourage you to forward this survey on to as many interested and relevant parties as possible. It will take approximately 10 minutes to complete the questionnaire. No question in the survey is mandatory; feel free to only answer the questions which are relevant to you and your applications. Your answers will help the MPI Forum guide its process to create a genuinely useful MPI-3 standard. This survey closes December 31, 2009. Your survey responses will be strictly confidential and data from this research will be reported only in the aggregate. Your information will be coded and will remain confidential. If you have questions at any time about the survey or the procedures, you may contact the MPI Forum via email to mpi-comme...@mpi-forum.org. Thank you very much for your time and support. -- Jeff Squyres jsquy...@cisco.com
[OMPI users] Program hangs
Hi I killed the job and re-submit it. At this can it can go on to run, but today I found an even serious problem with Ompi. I compared the results of mpich2 and ompi, finding that the results from ompi is wrong, which finished prior to the real end. In other word, the optimized structure (by vasp) does not converge, but it reported that the run was successful. Amasing! For the same initial structure, run with mpich2 requires 80 ion steps, while the run with ompi needs only 40! On Fri, Nov 20, 2009 at 4:20 PM, vasilis gkanis wrote: > Hello, > > I also experience a similar problem with the MUMPS solver, when I run it on > a > cluster. After several hours of running the code does not produce any > results, > although the command top shows that the program occupies 100% of the CPU. > > The difference here, however, is that the same program runs fine on my PC. > The > differences between my PC and the cluster are: > 1) 32bit vs 64-bit(cluster) > 2) intel compiler vs portland compiler(cluster) > > Any thoughts on what might cause this? > > Thank you, > Vasilis > > > On Friday 20 November 2009 03:50:17 am Jiaye Li wrote: > > Hello > > > > I installed openmpi-1.3.3 on my single node(cpu) intel 64bit quad-core > > machine. The compiler info is: > > > > > > > *** > > *** intel-icc101018-10.1.018-1.i386 > > libgcc-4.4.0-4.i586 > > gcc-4.4.0-4.i586 > > gcc-gfortran-4.4.0-4.i586 > > gcc-c++-4.4.0-4.i586 > > intel-ifort101018-10.1.018-1.i386 > > > > and the architecture is: > > > > processor: 0 > > vendor_id: GenuineIntel > > cpu family: 6 > > model: 23 > > model name: Intel(R) Core(TM)2 Quad CPUQ9550 @ 2.83GHz > > stepping: 10 > > cpu MHz: 2825.937 > > cache size: 6144 KB > > physical id: 0 > > siblings: 4 > > core id: 0 > > cpu cores: 4 > > apicid: 0 > > initial apicid: 0 > > fdiv_bug: no > > hlt_bug: no > > f00f_bug: no > > coma_bug: no > > fpu: yes > > fpu_exception: yes > > cpuid level: 13 > > wp: yes > > flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm > > constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est > > tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi > flexpriority > > bogomips: 5651.87 > > clflush size: 64 > > power management: > > > > > *** > > *** > > > > I compiled PWscf program with openmpi and tested the program. At the > > beginning, the execution of PW went on well, but after about 10 h, when > > the program is going to finish. The program hang there, but the cpu time > > is still occupied. (100% taken up by the program). There seems to be > > something wrong, somewhere. Any ideas? Thank you in advance. > > > > This is the config.log of Ompi: > > > > > *** > > *** This file contains any messages > > produced by compilers while > > running configure, to aid debugging if configure makes a mistake. > > > > It was created by Open MPI configure 1.3.3, which was > > generated by GNU Autoconf 2.63. Invocation command line was > > > > $ ./configure --prefix=/opt/openmpi-1.3.3 --disable-static CC=gcc > > FC=ifort F77=ifort --enable-shared > > > > ## - ## > > ## Platform. ## > > ## - ## > > > > hostname = localhost > > uname -m = i686 > > uname -r = 2.6.29.4-167.fc11.i686.PAE > > uname -s = Linux > > uname -v = #1 SMP Wed May 27 17:28:22 EDT 2009 > > > > /usr/bin/uname -p = unknown > > /bin/uname -X = unknown > > > > /bin/arch = i686 > > /usr/bin/arch -k = unknown > > /usr/convex/getsysinfo = unknown > > /usr/bin/hostinfo = unknown > > /bin/machine = unknown > > /usr/bin/oslevel = unknown > > /bin/universe = unknown > > > > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all > > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all > > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all > > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all > > PATH: /home/jy/.wine/drive_c/windows > > PATH: /home/jy/Download/XCrySDen-1.5.21-src > > PATH: /home/jy/bin/vtstscripts > > PATH: /opt/mpich2-1.2/bin > > PATH: /opt/intel/fc/10.1.018/bin > > PATH: /opt/intel/cc/10.1.018/bin > > PATH: /usr/lib/qt-3.3/bin > > PATH: /usr/kerberos/bin > > PATH: /usr/lib/ccache > > PATH: /usr/local/bin > > PATH: /usr/bin > > PATH: /bin > > PATH: /usr/local/sbin > > PATH: /usr/sbin > > PATH: /sbin > > PATH: /home/jy/Download/XCrySDen-1.5.21-src/scripts > > PATH: /home/jy/Download/XCrySDen-1.5.21-src/util > > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts > > PATH: /home/jy/Download/XCrySDen-1.5.21-src-al