[OMPI users] question running on heterogeneous systems

2009-01-01 Thread Mahmoud Payami
Dear OpenMPI Users,

I have two systems, one with Intel64 processor, and one with IA32. The OSs
on first is CentOS-86_64 and the other CentOS-i386. I installed Intel
fortran compiler 10.1 on both.  In the first I use the fce, and in the
second I use fc directories (ifortvars.sh/csh). I have compiled openmpi
separately on each machine. Now, I could not run my application whch is
compiled on ia32 machine. Should I use "fc" instead of "fce" on intel64 and
then compile openmpi with that?

Best regards,

Mahmoud Payami

PS: I have read the following FAQ but I need specific answer.


As of v1.1, Open MPI requires that the size of C, C++, and Fortran datatypes
be the same on all platforms within a single parallel application with the
exception of types represented by MPI_BOOL and MPI_LOGICAL -- size
differences in these types between processes are properly handled. Endian
differences between processes in a single MPI job are properly and
automatically handled.

Prior to v1.1, Open MPI did not include any support for data size or endian
heterogeneity.


Re: [OMPI users] question running on heterogeneous systems

2009-01-02 Thread Mahmoud Payami
On Fri, Jan 2, 2009 at 9:08 AM, doriankrause  wrote:

> Mahmoud Payami wrote:
>
>>
>> Dear OpenMPI Users,
>>
>> I have two systems, one with Intel64 processor, and one with IA32. The OSs
>> on first is CentOS-86_64 and the other CentOS-i386. I installed Intel
>> fortran compiler 10.1 on both.  In the first I use the fce, and in the
>> second I use fc directories (ifortvars.sh/csh). I have compiled openmpi
>> separately on each machine. Now, I could not run my application whch is
>> compiled on ia32 machine. Should I use "fc" instead of "fce" on intel64 and
>> then compile openmpi with that?
>>
>>
> Could you give us some more information? What is the error message?
> You said that the application is compiled for the 32 bit architecture. I'm
> not used to mixing 32/64 bit architectures. Does the application run on each
> host seperately?
>
> Dorian
>
>
>
Dear Dorian,
Thank you  for your contribution. The application, compiled on each box
separately, is ok with mpi an no problem. Recently, I had checked that a
binary file created on ia32, also works on 86_64 but the reverse is not
true. So, why not a parallel program which is compiled on ia32 box? I think,
if I configure and install openmpi using ia32 intel compiler on 86_64 box,
then it will be resolved.
I have to check it and will report the result. In present case, it is
searching for shared lib.so.0 which has some extension "..ELF...64". I have
already added "/usr/local/lib" which contains mpi libs in LD_LIBRARY_PATH
otherwise they would not work on each box even separatey.
Bests, Happy 2009
mahmoud


Re: [OMPI users] question running on heterogeneous systems

2009-01-06 Thread Mahmoud Payami
Dear Gus,

Thank you for the detailed explanation. It is quite helpful. I think now I
have got how to manage the problem.

Best regards,

Mahmoud Payami
Theoretical Physics Group,
Atomic Energy Organization of Iran
Tehran-Iran
mpay...@aeoi.org.ir


On Mon, Jan 5, 2009 at 12:21 PM, Gus Correa  wrote:

> Mahmoud Payami wrote:
>
>>
>>
>> On Fri, Jan 2, 2009 at 9:08 AM, doriankrause > doriankra...@web.de>> wrote:
>>
>>Mahmoud Payami wrote:
>>
>>
>>Dear OpenMPI Users,
>>
>>I have two systems, one with Intel64 processor, and one with
>>IA32. The OSs on first is CentOS-86_64 and the other
>>CentOS-i386. I installed Intel fortran compiler 10.1 on both.
>> In the first I use the fce, and in the second I use fc
>>directories (ifortvars.sh/csh). I have compiled openmpi
>>separately on each machine. Now, I could not run my
>>application whch is compiled on ia32 machine. Should I use
>>"fc" instead of "fce" on intel64 and then compile openmpi with
>>that?
>>
>>
>>Could you give us some more information? What is the error message?
>>You said that the application is compiled for the 32 bit
>>architecture. I'm not used to mixing 32/64 bit architectures. Does
>>the application run on each host seperately?
>>
>>Dorian
>>
>>
>>
>>  Hi Mahmoud, list
>
>> Dear Dorian,
>> Thank you  for your contribution. The application, compiled on each box
>> separately, is ok with mpi an no problem. Recently, I had checked that a
>> binary file created on ia32, also works on 86_64 but the reverse is not
>> true.
>>
> That is correct.
> x86-64 architecture can run 32-bit binaries,
> but 64-bit binaries don't work on x86 machines.
>
>> So, why not a parallel program which is compiled on ia32 box? I think, if
>> I configure and install openmpi using ia32 intel compiler on 86_64 box, then
>> it will be resolved.
>>
> 1. You need to compile OpenMPI separately on each architecture.
> Use the "--prefix=/path/to/my/openmpi/32bit/" (32-bit example/suggestion)
> configure option, to install the two libraries on different locations,
> if you want. This will make clear for which architecture the library was
> built for.
>
> 2. You need to compile your application separately on each architecture,
> and link to the OpenMPI libraries built for that specific architecture
> according to item 1  above.
> (I.e. don't mix apples and oranges.)
>
> 3. You need to have the correct environment variables set
> on each machine architecture.
> They are *different* on each architecture.
>
> I.e., if you use Intel Fortran,
> source the fc script on the 32bit machine,
> and source the fce script on the 64-bit machine.
>
> This can be done on the .bashrc or .tcshrc file.
> If you have a different home directory on each machine,
> you can write a .bashrc or .tcshrc file for each architecture.
> If you have a single NFS mounted home directory,
> use a trick like this (tcsh example):
>
> if ( $HOST == "my_32bit_hostname" ) then
>   source /path/to/intel/fc/bin/ifortvars.csh # Note "fc" here.
> else if ( $HOST == "my_64bit_hostname"  ) then
>   source /path/to/intel/fce/bin/ifortvars.csh   # Note "fce" here.
> endif
>
> whatever your "my_32bit_hostname", "my_64bit_hostname".
> /path/to/intel/fc/, and  /path/to/intel/fce/   are.
> (Do "hostname" on each machine to find out the right name to use.)
>
> Likewise for the OpenMPI binaries (mpicc, mpif90, mpirun, etc):
>
> if ( $HOST == "my_32bit_hostname" ) then
>   setenv PATH /path/to/my/openmpi/32bit/bin:$PATH   # Note "32bit" here.
> else if ( $HOST == "my_64bit_hostname"  ) then
>   setenv PATH /path/to/my/openmpi/64bit/bin:$PATH# Note "64bit" here.
> endif
>
> This approach also works for separate home directories "per machine"
> (not NFS mounted), and is probably the simplest way to solve the problem.
>
> There are more elegant ways to setup the environment of choice,
> other than changing the user startup files.
> For instance, you can write intel.csh and intel.sh on the /etc/profile.d
> directory,
> to setup the appropriate environment as the user logs in.
> See also the "environment modules" package:
> http://modules.sourceforge.net/
>
> 4) If you run MPI programs across the two machines/architectures,
> make sure to use the MPI types on MPI function calls correctly,
> and to match them

[OMPI users] Threading fault(?)

2009-02-26 Thread Mahmoud Payami
Dear All,

I have installed openmpi-1.3.1 (with the defaults), and built my
application.
The linux box is 2Xamd64 quad. In the middle of running of my application, I
receive the message and stops.
I tried to configure openmpi using "--disable-mpi-threads" but it
automatically assumes "posix".
This problem does not happen in openmpi-1.2.9.
Any comment is highly appreciated.
Best regards,
        mahmoud payami


[hpc1:25353] *** Process received signal ***
[hpc1:25353] Signal: Segmentation fault (11)
[hpc1:25353] Signal code: Address not mapped (1)
[hpc1:25353] Failing at address: 0x51
[hpc1:25353] [ 0] /lib64/libpthread.so.0 [0x303be0dd40]
[hpc1:25353] [ 1] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so
[0x2e350d96]
[hpc1:25353] [ 2] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so
[0x2e3514a8]
[hpc1:25353] [ 3] /opt/openmpi131_cc/lib/openmpi/mca_btl_sm.so
[0x2eb7c72a]
[hpc1:25353] [ 4]
/opt/openmpi131_cc/lib/libopen-pal.so.0(opal_progress+0x89) [0x2b42b7d9]
[hpc1:25353] [ 5] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so
[0x2e34d27c]
[hpc1:25353] [ 6] /opt/openmpi131_cc/lib/libmpi.so.0(PMPI_Recv+0x210)
[0x2af46010]
[hpc1:25353] [ 7] /opt/openmpi131_cc/lib/libmpi_f77.so.0(mpi_recv+0xa4)
[0x2acd6af4]
[hpc1:25353] [ 8]
/opt/QE131_cc/bin/pw.x(parallel_toolkit_mp_zsqmred_+0x13da) [0x513d8a]
[hpc1:25353] [ 9] /opt/QE131_cc/bin/pw.x(pcegterg_+0x6c3f) [0x6667ff]
[hpc1:25353] [10] /opt/QE131_cc/bin/pw.x(diag_bands_+0xb9e) [0x65654e]
[hpc1:25353] [11] /opt/QE131_cc/bin/pw.x(c_bands_+0x277) [0x6575a7]
[hpc1:25353] [12] /opt/QE131_cc/bin/pw.x(electrons_+0x53f) [0x58a54f]
[hpc1:25353] [13] /opt/QE131_cc/bin/pw.x(MAIN__+0x1fb) [0x458acb]
[hpc1:25353] [14] /opt/QE131_cc/bin/pw.x(main+0x3c) [0x4588bc]
[hpc1:25353] [15] /lib64/libc.so.6(__libc_start_main+0xf4) [0x303b21d8a4]
[hpc1:25353] [16] /opt/QE131_cc/bin/pw.x(realloc+0x1b9) [0x4587e9]
[hpc1:25353] *** End of error message ***
--
mpirun noticed that process rank 6 with PID 25353 on node hpc1 exited on
signal 11 (Segmentation fault).
--


[OMPI users] Threading fault

2009-02-27 Thread Mahmoud Payami
Dear All,

I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1
with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have
built my application.
The linux box is 2Xamd64 quad. In the middle of running of my application
(after some 15 iterations), I receive the message and stops.
I tried to configure openmpi using "--disable-mpi-threads" but it
automatically assumes "posix".
This problem does not happen in openmpi-1.2.9.
Any comment is highly appreciated.
Best regards,
mahmoud payami


[hpc1:25353] *** Process received signal ***
[hpc1:25353] Signal: Segmentation fault (11)
[hpc1:25353] Signal code: Address not mapped (1)
[hpc1:25353] Failing at address: 0x51
[hpc1:25353] [ 0] /lib64/libpthread.so.0 [0x303be0dd40]
[hpc1:25353] [ 1] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so
[0x2e350d96]
[hpc1:25353] [ 2] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so
[0x2e3514a8]
[hpc1:25353] [ 3] /opt/openmpi131_cc/lib/openmpi/mca_btl_sm.so
[0x2eb7c72a]
[hpc1:25353] [ 4]
/opt/openmpi131_cc/lib/libopen-pal.so.0(opal_progress+0x89) [0x2b42b7d9]
[hpc1:25353] [ 5] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so
[0x2e34d27c]
[hpc1:25353] [ 6] /opt/openmpi131_cc/lib/libmpi.so.0(PMPI_Recv+0x210)
[0x2af46010]
[hpc1:25353] [ 7] /opt/openmpi131_cc/lib/libmpi_f77.so.0(mpi_recv+0xa4)
[0x2acd6af4]
[hpc1:25353] [ 8]
/opt/QE131_cc/bin/pw.x(parallel_toolkit_mp_zsqmred_+0x13da) [0x513d8a]
[hpc1:25353] [ 9] /opt/QE131_cc/bin/pw.x(pcegterg_+0x6c3f) [0x6667ff]
[hpc1:25353] [10] /opt/QE131_cc/bin/pw.x(diag_bands_+0xb9e) [0x65654e]
[hpc1:25353] [11] /opt/QE131_cc/bin/pw.x(c_bands_+0x277) [0x6575a7]
[hpc1:25353] [12] /opt/QE131_cc/bin/pw.x(electrons_+0x53f) [0x58a54f]
[hpc1:25353] [13] /opt/QE131_cc/bin/pw.x(MAIN__+0x1fb) [0x458acb]
[hpc1:25353] [14] /opt/QE131_cc/bin/pw.x(main+0x3c) [0x4588bc]
[hpc1:25353] [15] /lib64/libc.so.6(__libc_start_main+0xf4) [0x303b21d8a4]
[hpc1:25353] [16] /opt/QE131_cc/bin/pw.x(realloc+0x1b9) [0x4587e9]
[hpc1:25353] *** End of error message ***
--
mpirun noticed that process rank 6 with PID 25353 on node hpc1 exited on
signal 11 (Segmentation fault).
--


[OMPI users] threading bug?

2009-02-27 Thread Mahmoud Payami
Dear All,

I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1
with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have
built my application.
The linux box is 2Xamd64 quad. In the middle of running of my application
(after some 15 iterations), I receive the message and stops.
I tried to configure openmpi using "--disable-mpi-threads" but it
automatically assumes "posix".
This problem does not happen in openmpi-1.2.9.
Any comment is highly appreciated.
Best regards,
mahmoud payami


[hpc1:25353] *** Process received signal ***
[hpc1:25353] Signal: Segmentation fault (11)
[hpc1:25353] Signal code: Address not mapped (1)
[hpc1:25353] Failing at address: 0x51
[hpc1:25353] [ 0] /lib64/libpthread.so.0 [0x303be0dd40]
[hpc1:25353] [ 1] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so
[0x2e350d96]
[hpc1:25353] [ 2] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so
[0x2e3514a8]
[hpc1:25353] [ 3] /opt/openmpi131_cc/lib/openmpi/mca_btl_sm.so
[0x2eb7c72a]
[hpc1:25353] [ 4]
/opt/openmpi131_cc/lib/libopen-pal.so.0(opal_progress+0x89) [0x2b42b7d9]
[hpc1:25353] [ 5] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so
[0x2e34d27c]
[hpc1:25353] [ 6] /opt/openmpi131_cc/lib/libmpi.so.0(PMPI_Recv+0x210)
[0x2af46010]
[hpc1:25353] [ 7] /opt/openmpi131_cc/lib/libmpi_f77.so.0(mpi_recv+0xa4)
[0x2acd6af4]
[hpc1:25353] [ 8]
/opt/QE131_cc/bin/pw.x(parallel_toolkit_mp_zsqmred_+0x13da) [0x513d8a]
[hpc1:25353] [ 9] /opt/QE131_cc/bin/pw.x(pcegterg_+0x6c3f) [0x6667ff]
[hpc1:25353] [10] /opt/QE131_cc/bin/pw.x(diag_bands_+0xb9e) [0x65654e]
[hpc1:25353] [11] /opt/QE131_cc/bin/pw.x(c_bands_+0x277) [0x6575a7]
[hpc1:25353] [12] /opt/QE131_cc/bin/pw.x(electrons_+0x53f) [0x58a54f]
[hpc1:25353] [13] /opt/QE131_cc/bin/pw.x(MAIN__+0x1fb) [0x458acb]
[hpc1:25353] [14] /opt/QE131_cc/bin/pw.x(main+0x3c) [0x4588bc]
[hpc1:25353] [15] /lib64/libc.so.6(__libc_start_main+0xf4) [0x303b21d8a4]
[hpc1:25353] [16] /opt/QE131_cc/bin/pw.x(realloc+0x1b9) [0x4587e9]
[hpc1:25353] *** End of error message ***
--
mpirun noticed that process rank 6 with PID 25353 on node hpc1 exited on
signal 11 (Segmentation fault).
--