Re: [OMPI users] problems compiling openmpi-1.6 on some platforms

2012-06-08 Thread Jeff Squyres
On Jun 7, 2012, at 10:27 AM, Siegmar Gross wrote:

> thank you very much for your help. You were right with your suggestion
> that one of our system commands is responsible for the segmentation
> fault. After splitting the command in config.status I found out that
> gawk was responsible. We installed the latest version and now
> everything works fine. Thank you very much once more.

Excellent -- glad you have it working!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Question on ./configure error on Tru64unix (OSF1) v5.1B-6 for openmpi-1.6

2012-06-08 Thread Jeff Squyres
To be honest, I don't think we've ever tested on Tru64, so I'm not surprised 
that it doesn't work.  Indeed, I think that it is unlikely that we will ever 
support Tru64.  :-(

Sorry!


On Jun 7, 2012, at 12:43 PM,   
wrote:

> 
> Hello,
> 
> I am having trouble with the *** Assembler section of the GNU autoconf
> step in trying to build OpenMPI version 1.6 on an HP AlphaServer GS160
> running Tru64unix version 5.1B-6:
> 
> # uname -a
> OSF1 zozma.cts.cwu.edu V5.1 2650 alpha
> 
> The output is of the ./configure run
> zozma(bash)% ./configure --prefix=/usr/local/OpenMPI \
> --enable-shared --enable-static :
> 
> ...
> 
> *** Assembler
> checking dependency style of gcc... gcc3
> checking for BSD- or MS-compatible name lister (nm)... /usr/local/bin/nm -B
> checking the name lister (/usr/local/bin/nm -B) interface... BSD nm
> checking for fgrep... /usr/local/bin/grep -F
> checking if need to remove -g from CCASFLAGS... no
> checking whether to enable smp locks... yes
> checking if .proc/endp is needed... no
> checking directive for setting text section... .text
> checking directive for exporting symbols... .globl
> checking for objdump... objdump
> checking if .note.GNU-stack is needed... no
> checking suffix for labels... :
> checking prefix for global symbol labels... none
> configure: error: Could not determine global symbol label prefix
> 
> The ./config.log is appended.
> 
> Can anyone provide some information or suggestions on how to resolve this
> issue?
> 
> Thank you for your assistance,
> Bill Glessner   - System programmer , Cenral Washington University
> 
> **


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] problems compiling openmpi-1.6 on some platforms

2012-06-08 Thread Siegmar Gross
Hello,

> >>> Unfortunately "cc" on Linux creates the following error.
> >>> 
> >>> ln -s "../../../openmpi-1.6/opal/asm/generated/
> >>> atomic-ia32-linux-nongas.s" atomic-asm.S
> >>> CPPAS  atomic-asm.lo
> >>> :19:0: warning: "__FLT_EVAL_METHOD__" redefined
> >>> [enabled by default]
> >>> :110:0: note: this is the location of the previous definition
> >>> cpp: fatal error: -fuse-linker-plugin, but liblto_plugin.so not found
> >>> compilation terminated.
> >>> cc: cpp failed for atomic-asm.S
> >>> make[2]: *** [atomic-asm.lo] Error 1
> >>> make[2]: Leaving directory `/.../opal/asm'
> >>> make[1]: *** [all-recursive] Error 1
> >>> make[1]: Leaving directory `/.../opal'
> >>> make: *** [all-recursive] Error 1
> >> 
> >> What compiler is "cc"?
> > 
> > "Sun C 5.12" (Oracle Solaris Studio 12.3 for Linux). Do you need
> > anything else?
> 
> Ah.  I will have to defer this to my Oracle brethren, then...

Today I edited ".../openmpi-1.6-Linux.x86_64.64_cc/libtool, removed
"|-fuse-linker-plugin" in line 6295 and started "config.status"
once more. Afterwards I could compile and install Open MPI. Can
somebody fix this problem in libtool? There was one more warning:

log.make.Linux.x86_64.64_cc:configure: WARNING: unrecognized options:
--enable-ltdl-convenience


Furthermore there is a stale link:

tyr src 627 ls /usr/local/openmpi-1.6_64_cc/share/man/man1/orteCC.1
ls: /usr/local/openmpi-1.6_64_cc/share/man/man1/orteCC.1:
  No such file or directory
tyr src 628 ls -l /usr/local/openmpi-1.6_64_cc/share/man/man1/orteCC.1
lrwxrwxrwx1 root root9 Jun  8 13:34
  /usr/local/openmpi-1.6_64_cc/share/man/man1/orteCC.1 -> ortec++.1

Should it be linked to mpic++.1?



I found another warning with Sun C 5.12 which shouldn't be a problem.

configure:54329: checking stdbool.h usability
configure:54329: cc -c -O -DNDEBUG -m64   conftest.c >&5
"/usr/include/stdbool.h", line 42: #error: "Use of 
  is valid only in a c99 compilation environment."
cc: acomp failed for conftest.c
configure:54329: $? = 2
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "Open MPI"
| #define PACKAGE_TARNAME "openmpi"
| #define PACKAGE_VERSION ""
...
| #include 
configure:54329: result: no
configure:54329: checking stdbool.h presence
configure:54329: cpp  conftest.c
configure:54329: $? = 0
configure:54329: result: yes
configure:54329: WARNING: stdbool.h: present but cannot be compiled
configure:54329: WARNING: stdbool.h: check for missing prerequisite headers?
configure:54329: WARNING: stdbool.h: see the Autoconf documentation
configure:54329: WARNING: stdbool.h: section "Present But Cannot Be 
Compiled"
configure:54329: WARNING: stdbool.h: proceeding with the compiler's result
configure:54329: checking for stdbool.h
configure:54329: result: no
configure:54341: checking if  works
configure:54374: result: no (don't have )


I wrote the above definitions and includes into a file and added a
main function.

cc -c -O -DNDEBUG -m64 stdbool_error.c
  "/usr/include/stdbool.h", line 42: #error: "Use of 
  is valid only in a c99 compilation environment."
  cc: acomp failed for stdbool_error.c

cc -c -xc99 -O -DNDEBUG -m64 stdbool_error.c

We need "-xc99" if "stdbool.h" should be used.


Kind regards

Siegmar



[OMPI users] Bug when mixing sent types in version 1.6

2012-06-08 Thread BOUVIER Benjamin
Hi everybody,

I have currently a bug when launching a very simple MPI program with mpirun, on 
connected nodes. This happens when I send an INT and then some CHAR strings 
from a master node to a worker node. 
Here is the minimal code to reproduce the bug :


# include 
# include 
# include 

int main(int argc, char **argv)
{
int rank, size;
const char someString[] = "Can haz cheezburgerz?";

MPI_Init(&argc, &argv);

MPI_Comm_rank( MPI_COMM_WORLD, & rank );
MPI_Comm_size( MPI_COMM_WORLD, & size );

if ( rank == 0 )
{
int len = strlen( someString );
int i;
for( i = 1; i < size; ++i)
{
MPI_Send( &len, 1, MPI_INT, i, 0, MPI_COMM_WORLD );
MPI_Send( &someString, len+1, MPI_CHAR, i, 0, MPI_COMM_WORLD );
}
} else {
char buffer[ 128 ];
int receivedLen;
MPI_Status stat;
MPI_Recv( &receivedLen, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat );
printf( "[Worker] Length : %d\n", receivedLen );
MPI_Recv( buffer, receivedLen+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);
printf( "[Worker] String : %s\n", buffer );
}

MPI_Finalize();
}



I know that there is a better way to send a string, by giving a maximum buffer 
size at the second MPI_Recv, but there is no the main topic here.
The launch works locally (i.e when the 2 processes are launched on one 
machine), but doesn't work when the 2 processes are dispatched in 2 machines 
through network (i.e one per host). In this case, the worker correctly reads 
the INT, and then master and worker block on the next call.
I have no issue when sending only char strings or only numbers. This only 
happens when sending char strings then numbers, or in the other order.

I'm using OpenMPI version 1.6, locally compiled. 
$ uname -a
Linux trtp7097 2.6.32-220.13.1.el6.x86_64 #1 SMP Thu Mar 29 11:46:40 EDT 2012 
x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release 
Red Hat Enterprise Linux Workstation release 6.2 (Santiago)

Is it a bad use of the framework or could it be a bug ?

Thank you in advance.
Benjamin


Re: [OMPI users] Bug when mixing sent types in version 1.6

2012-06-08 Thread Jeff Squyres
On Jun 8, 2012, at 6:43 AM, BOUVIER Benjamin wrote:

> # include 
> # include 
> # include 
> 
> int main(int argc, char **argv)
> {
>int rank, size;
>const char someString[] = "Can haz cheezburgerz?";
> 
>MPI_Init(&argc, &argv);
> 
>MPI_Comm_rank( MPI_COMM_WORLD, & rank );
>MPI_Comm_size( MPI_COMM_WORLD, & size );
> 
>if ( rank == 0 )
>{
>int len = strlen( someString );
>int i;
>for( i = 1; i < size; ++i)
>{
>MPI_Send( &len, 1, MPI_INT, i, 0, MPI_COMM_WORLD );
>MPI_Send( &someString, len+1, MPI_CHAR, i, 0, MPI_COMM_WORLD );
>}
>} else {
>char buffer[ 128 ];
>int receivedLen;
>MPI_Status stat;
>MPI_Recv( &receivedLen, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat );
>printf( "[Worker] Length : %d\n", receivedLen );
>MPI_Recv( buffer, receivedLen+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD, 
> &stat);
>printf( "[Worker] String : %s\n", buffer );
>}
> 
>MPI_Finalize();
> }

I don't see anything obviously wrong with this code.

> I know that there is a better way to send a string, by giving a maximum 
> buffer size at the second MPI_Recv, but there is no the main topic here.
> The launch works locally (i.e when the 2 processes are launched on one 
> machine), but doesn't work when the 2 processes are dispatched in 2 machines 
> through network (i.e one per host). In this case, the worker correctly reads 
> the INT, and then master and worker block on the next call.

That's very odd.

> I have no issue when sending only char strings or only numbers. This only 
> happens when sending char strings then numbers, or in the other order.

That's even more odd.

Can you run standard benchmarks like MPI net pipe, and/or the OSU benchmarks?  
(across multiple nodes, that is)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Question on ./configure error on Tru64unix (OSF1) v5.1B-6 for openmpi-1.6

2012-06-08 Thread Rayson Ho
Hi Bill,

If you *really* have time, then you can go deep into the log, and find
out why configure failed. It looks like configure failed when it tried
to compile this code:

 .text
 # .gsym_test_func
 .globl .gsym_test_func
 .gsym_test_func:
 # .gsym_test_func

 configure:26752: result: none
 configure:26756: error: Could not determine global symbol label prefix

May be it's a gcc thing?? Like your assembler is too old?? I tried it
in Cygwin, which has gcc 3.4.4, and it seems to work fine (just copy
the 5 lines of code above into a file and name it with the ".s" ext
name. Then compile it with gcc and see if you can reproduce it.

I was involved in a TOP500 project that uses AlphaServer SC ES45 nodes
(a total of 4,096 cores), and it was the #2 in TOP500 a decade ago! It
was fun back then... But I agree with Jeff, it is unlikely that Open
MPI is going to work on Tru64 - all modern processors are much faster
than Alpha and I believe even the TOP500 Alpha machines are all
powered down (even the Earth Simulator is not on the TOP500 list
anymore - that was the #1 back then!!).

Rayson



On Fri, Jun 8, 2012 at 7:07 AM, Jeff Squyres  wrote:
> To be honest, I don't think we've ever tested on Tru64, so I'm not surprised 
> that it doesn't work.  Indeed, I think that it is unlikely that we will ever 
> support Tru64.  :-(
>
> Sorry!
>
>
> On Jun 7, 2012, at 12:43 PM,   
> wrote:
>
>>
>> Hello,
>>
>> I am having trouble with the *** Assembler section of the GNU autoconf
>> step in trying to build OpenMPI version 1.6 on an HP AlphaServer GS160
>> running Tru64unix version 5.1B-6:
>>
>> # uname -a
>> OSF1 zozma.cts.cwu.edu V5.1 2650 alpha
>>
>> The output is of the ./configure run
>> zozma(bash)% ./configure --prefix=/usr/local/OpenMPI \
>> --enable-shared --enable-static :
>>
>> ...
>>
>> *** Assembler
>> checking dependency style of gcc... gcc3
>> checking for BSD- or MS-compatible name lister (nm)... /usr/local/bin/nm -B
>> checking the name lister (/usr/local/bin/nm -B) interface... BSD nm
>> checking for fgrep... /usr/local/bin/grep -F
>> checking if need to remove -g from CCASFLAGS... no
>> checking whether to enable smp locks... yes
>> checking if .proc/endp is needed... no
>> checking directive for setting text section... .text
>> checking directive for exporting symbols... .globl
>> checking for objdump... objdump
>> checking if .note.GNU-stack is needed... no
>> checking suffix for labels... :
>> checking prefix for global symbol labels... none
>> configure: error: Could not determine global symbol label prefix
>>
>> The ./config.log is appended.
>>
>> Can anyone provide some information or suggestions on how to resolve this
>> issue?
>>

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

http://blogs.scalablelogic.com/



[OMPI users] RE : Bug when mixing sent types in version 1.6

2012-06-08 Thread BOUVIER Benjamin
Hi Jeff,

Thanks for your answer.

I have downloaded the Netpipe benchmarks suite, launched `make mpi` and 
launched with mpirun the resulting executable.

Here is an interesting fact : by launching this executable on 2 nodes, it works 
; on 3 nodes, it blocks, I guess on connect. 
Each process is running on a core, on each machine, using 100% of one CPU ; but 
nothing else happens. I have to kill the program to quit. 
Setting the option -mca btl_base_verbose to 30 shows me that the last thing 
tried by each node is to connect to other nodes.

May it be a network issue ? 

Thanks,
--
Benjamin Bouvier


De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de Jeff 
Squyres [jsquy...@cisco.com]
Date d'envoi : vendredi 8 juin 2012 16:30
À : Open MPI Users
Objet : Re: [OMPI users] Bug when mixing sent types in version 1.6

On Jun 8, 2012, at 6:43 AM, BOUVIER Benjamin wrote:

> # include 
> # include 
> # include 
>
> int main(int argc, char **argv)
> {
>int rank, size;
>const char someString[] = "Can haz cheezburgerz?";
>
>MPI_Init(&argc, &argv);
>
>MPI_Comm_rank( MPI_COMM_WORLD, & rank );
>MPI_Comm_size( MPI_COMM_WORLD, & size );
>
>if ( rank == 0 )
>{
>int len = strlen( someString );
>int i;
>for( i = 1; i < size; ++i)
>{
>MPI_Send( &len, 1, MPI_INT, i, 0, MPI_COMM_WORLD );
>MPI_Send( &someString, len+1, MPI_CHAR, i, 0, MPI_COMM_WORLD );
>}
>} else {
>char buffer[ 128 ];
>int receivedLen;
>MPI_Status stat;
>MPI_Recv( &receivedLen, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat );
>printf( "[Worker] Length : %d\n", receivedLen );
>MPI_Recv( buffer, receivedLen+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD, 
> &stat);
>printf( "[Worker] String : %s\n", buffer );
>}
>
>MPI_Finalize();
> }

I don't see anything obviously wrong with this code.

> I know that there is a better way to send a string, by giving a maximum 
> buffer size at the second MPI_Recv, but there is no the main topic here.
> The launch works locally (i.e when the 2 processes are launched on one 
> machine), but doesn't work when the 2 processes are dispatched in 2 machines 
> through network (i.e one per host). In this case, the worker correctly reads 
> the INT, and then master and worker block on the next call.

That's very odd.

> I have no issue when sending only char strings or only numbers. This only 
> happens when sending char strings then numbers, or in the other order.

That's even more odd.

Can you run standard benchmarks like MPI net pipe, and/or the OSU benchmarks?  
(across multiple nodes, that is)

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] RE : Bug when mixing sent types in version 1.6

2012-06-08 Thread Jeff Squyres
On Jun 8, 2012, at 8:51 AM, BOUVIER Benjamin wrote:

> I have downloaded the Netpipe benchmarks suite, launched `make mpi` and 
> launched with mpirun the resulting executable.
> 
> Here is an interesting fact : by launching this executable on 2 nodes, it 
> works ; on 3 nodes, it blocks, I guess on connect. 

Netpipe is only intended for 2 processes -- I'm actually not sure offhand what 
happens if you run it with 3...

> Each process is running on a core, on each machine, using 100% of one CPU ; 
> but nothing else happens. I have to kill the program to quit. 

This is to be expected.  OMPI polls aggressively for network progress (i.e., 
consumes 100% of a core).

> Setting the option -mca btl_base_verbose to 30 shows me that the last thing 
> tried by each node is to connect to other nodes.


We don't output verbose messages for MPI traffic, so the lack of messages there 
doesn't mean anything.

I'd guess that running net pipe with 3 procs may be undefined.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/