Re: [OMPI users] Program hangs

2009-11-20 Thread vasilis gkanis
Hello,

I also experience a similar problem with the MUMPS solver, when I run it on a 
cluster. After several hours of running the code does not produce any results, 
although the command top shows that the program occupies 100% of the CPU.

The difference here, however, is that the same program runs fine on my PC. The 
differences between my PC and the cluster are:
1) 32bit vs 64-bit(cluster)
2) intel compiler vs portland compiler(cluster)

Any thoughts on what might cause this?

Thank you,
Vasilis


On Friday 20 November 2009 03:50:17 am Jiaye Li wrote:
> Hello
> 
> I installed openmpi-1.3.3 on my single node(cpu) intel 64bit quad-core
> machine. The compiler info is:
> 
> 
> ***
> *** intel-icc101018-10.1.018-1.i386
> libgcc-4.4.0-4.i586
> gcc-4.4.0-4.i586
> gcc-gfortran-4.4.0-4.i586
> gcc-c++-4.4.0-4.i586
> intel-ifort101018-10.1.018-1.i386
> 
> and the architecture is:
> 
> processor: 0
> vendor_id: GenuineIntel
> cpu family: 6
> model: 23
> model name: Intel(R) Core(TM)2 Quad CPUQ9550  @ 2.83GHz
> stepping: 10
> cpu MHz: 2825.937
> cache size: 6144 KB
> physical id: 0
> siblings: 4
> core id: 0
> cpu cores: 4
> apicid: 0
> initial apicid: 0
> fdiv_bug: no
> hlt_bug: no
> f00f_bug: no
> coma_bug: no
> fpu: yes
> fpu_exception: yes
> cpuid level: 13
> wp: yes
> flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>  cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
>  constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est
>  tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
> bogomips: 5651.87
> clflush size: 64
> power management:
> 
> ***
> ***
> 
> I compiled PWscf program with openmpi and tested the program. At the
> beginning, the execution of  PW went on well, but after about 10 h, when
>  the program is going to finish. The program hang there, but the cpu time
>  is still occupied. (100% taken up by the program). There seems to be
>  something wrong, somewhere. Any ideas? Thank you in advance.
> 
> This is the config.log of Ompi:
> 
> ***
> *** This file contains any messages
>  produced by compilers while
> running configure, to aid debugging if configure makes a mistake.
> 
> It was created by Open MPI configure 1.3.3, which was
> generated by GNU Autoconf 2.63.  Invocation command line was
> 
>   $ ./configure --prefix=/opt/openmpi-1.3.3 --disable-static CC=gcc
>  FC=ifort F77=ifort --enable-shared
> 
> ## - ##
> ## Platform. ##
> ## - ##
> 
> hostname = localhost
> uname -m = i686
> uname -r = 2.6.29.4-167.fc11.i686.PAE
> uname -s = Linux
> uname -v = #1 SMP Wed May 27 17:28:22 EDT 2009
> 
> /usr/bin/uname -p = unknown
> /bin/uname -X = unknown
> 
> /bin/arch  = i686
> /usr/bin/arch -k   = unknown
> /usr/convex/getsysinfo = unknown
> /usr/bin/hostinfo  = unknown
> /bin/machine   = unknown
> /usr/bin/oslevel   = unknown
> /bin/universe  = unknown
> 
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all
> PATH: /home/jy/.wine/drive_c/windows
> PATH: /home/jy/Download/XCrySDen-1.5.21-src
> PATH: /home/jy/bin/vtstscripts
> PATH: /opt/mpich2-1.2/bin
> PATH: /opt/intel/fc/10.1.018/bin
> PATH: /opt/intel/cc/10.1.018/bin
> PATH: /usr/lib/qt-3.3/bin
> PATH: /usr/kerberos/bin
> PATH: /usr/lib/ccache
> PATH: /usr/local/bin
> PATH: /usr/bin
> PATH: /bin
> PATH: /usr/local/sbin
> PATH: /usr/sbin
> PATH: /sbin
> PATH: /home/jy/Download/XCrySDen-1.5.21-src/scripts
> PATH: /home/jy/Download/XCrySDen-1.5.21-src/util
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/util
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/util
> PATH: /home/jy/bin
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/util
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts
> PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/util
> 
> 
> ## --- ##
> ## Core tests. ##
> ## --- ##
> 
> configure:3424: checking for a BSD-compatible install
> configure:3492: result: /usr/bin/install -c
> configure:3503: checking whether build environment is sane
> configure:3546: result: yes
> configure:3571: checking for a thread-safe mkdir -p
> configure:3610: result: /bin/mkdir -p
> configure:3623: checking for gawk
> configure:3639: found /usr/bi

Re: [OMPI users] nonblocking MPI_File_iwrite() does block?

2009-11-20 Thread Rob Latham
On Mon, Nov 16, 2009 at 11:20:44AM +0100, Christoph Rackwitz wrote:
> It's been ten days now. I'd like to resurrect this, in case someone
> can help and just missed it.

Hi.  I only check in on the OpenMPI list periodically.  Sorry for the
delay.

The standard in no way requires any overlap for either the nonblocking
communication or I/O routines.  There are long and heated discussions
about "strict" or "weak" interpretation of the progress rule and which
one is "better".

If you want asynchronous nonblocking I/O, you might have to roll all
the way back to LAM or MPICH-1.2.7, when ROMIO used its own request
objects and test/wait routines on top of the aio routines.

In order to have standard request objects and use the standard
test/wait routines, ROMIO switched to generalized requests.  However,
it's difficult to make progress on generalized requests without using
threads, so we do all the work when the job is posted and as you
observe, MPI_Wait() discovers immediately that the job is complete.

I proposed an extension to MPI generalized requests a few years ago
that would make them more amenable to libraries like ROMIO.

Today systems have a ton of cores. Spawning an I/O thread is not such
an onerous burden.  But we don't spawn such a thread in ROMIO, and so
nonblocking I/O is not asynchronous.

What if you moved your MPI_File_write call into a thread?  There are
several ways to do this: you could use standard generalized reqeusts
and make progress with a thread -- the 
application writer has a lot more knowledge about the systems and how
best to allocate threads.

If I may ask a slightly different question: you've got periods of I/O
and periods of computation.  Have you evaluated collective I/O?  I
know you are eager to hide I/O in the background -- to get it for free
-- but there's no such thing as a free lunch.  Background I/O might
still perturb your computation phase, unless you make zero MPI calls
in your computational phase.   Collective I/O can bring some fairly
powerful optimizations to the table and reduce your overall I/O costs,
perhaps even reducing them enough that you no longer miss true
asynchronous I/O ?

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


[OMPI users] Fwd: Call for participation: MPI Forum User Survey

2009-11-20 Thread Jeff Squyres
The MPI Forum announced at its SC09 BOF that they are soliciting  
community feedback to help guide the MPI-3 standards process.  A  
survey is available online at the following URL:


http://mpi-forum.questionpro.com/
Password: mpi3

In this survey, the MPI Forum is asking as many people as possible for  
feedback on the MPI-3 process -- what features to include, what  
features to not include, etc.


We encourage you to forward this survey on to as many interested and  
relevant parties as possible.


It will take approximately 10 minutes to complete the questionnaire.

No question in the survey is mandatory; feel free to only answer the  
questions which are relevant to you and your applications. Your  
answers will help the MPI Forum guide its process to create a  
genuinely useful MPI-3 standard.


This survey closes December 31, 2009.

Your survey responses will be strictly confidential and data from this  
research will be reported only in the aggregate. Your information will  
be coded and will remain confidential. If you have questions at any  
time about the survey or the procedures, you may contact the MPI Forum  
via email to mpi-comme...@mpi-forum.org.


Thank you very much for your time and support.

--
Jeff Squyres
jsquy...@cisco.com



[OMPI users] Program hangs

2009-11-20 Thread Jiaye Li
Hi

I killed the job and re-submit it. At this can it can go on to run, but
today I found an even serious problem with Ompi. I compared the results of
mpich2 and ompi, finding that the results from ompi is wrong, which finished
prior to the real end. In other word, the optimized structure (by vasp) does
not converge, but it reported that the run was successful. Amasing! For the
same initial structure,  run with mpich2 requires 80 ion steps, while the
run with ompi needs only 40!

On Fri, Nov 20, 2009 at 4:20 PM, vasilis gkanis
wrote:

> Hello,
>
> I also experience a similar problem with the MUMPS solver, when I run it on
> a
> cluster. After several hours of running the code does not produce any
> results,
> although the command top shows that the program occupies 100% of the CPU.
>
> The difference here, however, is that the same program runs fine on my PC.
> The
> differences between my PC and the cluster are:
> 1) 32bit vs 64-bit(cluster)
> 2) intel compiler vs portland compiler(cluster)
>
> Any thoughts on what might cause this?
>
> Thank you,
> Vasilis
>
>
> On Friday 20 November 2009 03:50:17 am Jiaye Li wrote:
> > Hello
> >
> > I installed openmpi-1.3.3 on my single node(cpu) intel 64bit quad-core
> > machine. The compiler info is:
> >
> >
> >
> ***
> > *** intel-icc101018-10.1.018-1.i386
> > libgcc-4.4.0-4.i586
> > gcc-4.4.0-4.i586
> > gcc-gfortran-4.4.0-4.i586
> > gcc-c++-4.4.0-4.i586
> > intel-ifort101018-10.1.018-1.i386
> >
> > and the architecture is:
> >
> > processor: 0
> > vendor_id: GenuineIntel
> > cpu family: 6
> > model: 23
> > model name: Intel(R) Core(TM)2 Quad CPUQ9550  @ 2.83GHz
> > stepping: 10
> > cpu MHz: 2825.937
> > cache size: 6144 KB
> > physical id: 0
> > siblings: 4
> > core id: 0
> > cpu cores: 4
> > apicid: 0
> > initial apicid: 0
> > fdiv_bug: no
> > hlt_bug: no
> > f00f_bug: no
> > coma_bug: no
> > fpu: yes
> > fpu_exception: yes
> > cpuid level: 13
> > wp: yes
> > flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> >  cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
> >  constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est
> >  tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi
> flexpriority
> > bogomips: 5651.87
> > clflush size: 64
> > power management:
> >
> >
> ***
> > ***
> >
> > I compiled PWscf program with openmpi and tested the program. At the
> > beginning, the execution of  PW went on well, but after about 10 h, when
> >  the program is going to finish. The program hang there, but the cpu time
> >  is still occupied. (100% taken up by the program). There seems to be
> >  something wrong, somewhere. Any ideas? Thank you in advance.
> >
> > This is the config.log of Ompi:
> >
> >
> ***
> > *** This file contains any messages
> >  produced by compilers while
> > running configure, to aid debugging if configure makes a mistake.
> >
> > It was created by Open MPI configure 1.3.3, which was
> > generated by GNU Autoconf 2.63.  Invocation command line was
> >
> >   $ ./configure --prefix=/opt/openmpi-1.3.3 --disable-static CC=gcc
> >  FC=ifort F77=ifort --enable-shared
> >
> > ## - ##
> > ## Platform. ##
> > ## - ##
> >
> > hostname = localhost
> > uname -m = i686
> > uname -r = 2.6.29.4-167.fc11.i686.PAE
> > uname -s = Linux
> > uname -v = #1 SMP Wed May 27 17:28:22 EDT 2009
> >
> > /usr/bin/uname -p = unknown
> > /bin/uname -X = unknown
> >
> > /bin/arch  = i686
> > /usr/bin/arch -k   = unknown
> > /usr/convex/getsysinfo = unknown
> > /usr/bin/hostinfo  = unknown
> > /bin/machine   = unknown
> > /usr/bin/oslevel   = unknown
> > /bin/universe  = unknown
> >
> > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all
> > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all
> > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all
> > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all
> > PATH: /home/jy/.wine/drive_c/windows
> > PATH: /home/jy/Download/XCrySDen-1.5.21-src
> > PATH: /home/jy/bin/vtstscripts
> > PATH: /opt/mpich2-1.2/bin
> > PATH: /opt/intel/fc/10.1.018/bin
> > PATH: /opt/intel/cc/10.1.018/bin
> > PATH: /usr/lib/qt-3.3/bin
> > PATH: /usr/kerberos/bin
> > PATH: /usr/lib/ccache
> > PATH: /usr/local/bin
> > PATH: /usr/bin
> > PATH: /bin
> > PATH: /usr/local/sbin
> > PATH: /usr/sbin
> > PATH: /sbin
> > PATH: /home/jy/Download/XCrySDen-1.5.21-src/scripts
> > PATH: /home/jy/Download/XCrySDen-1.5.21-src/util
> > PATH: /home/jy/Download/XCrySDen-1.5.21-src-all/scripts
> > PATH: /home/jy/Download/XCrySDen-1.5.21-src-al