[OMPI users] Installation fails on Mac Os

2007-03-25 Thread Daniele Avitabile

Hi everybody,

I am trying to install open mpi on a Mac Os XServer, and the make all
command exits with the error

openmpi_install_failed.tar.gz

as you can see from the output files I attached.

Some comments that may be helpful:

1) I am not root on the machine, but I have permissions to write in
/usr/local/applications/, which is the directory in which I want to install
openmpi.

2) In the same directory there is already an openmpi 1.1.2 installation,
with gcc-4.0.1 compilers. I want to install the current version of openmpi
and use a different compiler, namely the gcc compilers optimised for apple
intel. They reside in the folder /usr/local/bin, and I pass them in the make
command, as you can see from the attached file.

Any idea as to why I receive that error?

Thanks a lot in advance

Daniele


openmpi_install_failed.tar.gz
Description: GNU Zip compressed data


Re: [OMPI users] MPI processes swapping out

2007-03-25 Thread Heywood, Todd
Thanks, George. I will try the trunk version (1.3a1r14138) tomorrow. However, I 
am
keeping the number of processes per node (i.e. 4, one per core) constant. The 
system time, and the number of sleep states (eyeballed via top) grows 
significantly as the number of nodes scale up.

I was wondering if this might be the OS jitter/noise problem.

Todd


-Original Message-
From: users-boun...@open-mpi.org on behalf of George Bosilca
Sent: Fri 3/23/2007 7:15 PM
To: Open MPI Users
Subject: Re: [OMPI users] MPI processes  swapping out
 
So far the described behavior seems as normal as expected. As Open  
MPI never goes in blocking mode, the processes will always spin  
between active and sleep mode. More processes on the same node leads  
to more time in the system mode (because of the empty polls). There  
is a trick in the trunk version of Open MPI which will trigger the  
blocking mode if and only if TCP is the only used device. Please try  
add "--mca btl tcp,self" to your mpirun command line, and check the  
output of vmstat.

   Thanks,
 george.

On Mar 23, 2007, at 3:32 PM, Heywood, Todd wrote:

> Rolf,
>
>> Is it possible that everything is working just as it should?
>
> That's what I'm afraid of :-). But I did not expect to see such
> communication overhead due to blocking from mpiBLAST, which is very
> course-grained. I then tried HPL, which is computation-heavy, and  
> found the
> same thing. Also, the system time seemed to correspond to the MPI  
> processes
> cycling between run and sleep (as seen via top), and I thought that  
> setting
> the mpi_yield_when_idle parameter to 0 would keep the processes from
> entering sleep state when blocking. But it doesn't.
>
> Todd
>
>
>
> On 3/23/07 2:06 PM, "Rolf Vandevaart"  wrote:
>
>>
>> Todd:
>>
>> I assume the system time is being consumed by
>> the calls to send and receive data over the TCP sockets.
>> As the number of processes in the job increases, then more
>> time is spent waiting for data from one of the other processes.
>>
>> I did a little experiment on a single node to see the difference
>> in system time consumed when running over TCP vs when
>> running over shared memory.   When running on a single
>> node and using the sm btl, I see almost 100% user time.
>> I assume this is because the sm btl handles sending and
>> receiving its data within a shared memory segment.
>> However, when I switch over to TCP, I see my system time
>> go up.  Note that this is on Solaris.
>>
>> RUNNING OVER SELF,SM
>>> mpirun -np 8 -mca btl self,sm hpcc.amd64
>>
>>PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG  
>> PROCESS/NLWP
>>   3505 rolfv100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   0  75 182   0  
>> hpcc.amd64/1
>>   3503 rolfv100 0.0 0.0 0.0 0.0 0.0 0.0 0.2   0  69 116   0  
>> hpcc.amd64/1
>>   3499 rolfv 99 0.0 0.0 0.0 0.0 0.0 0.0 0.5   0 106 236   0  
>> hpcc.amd64/1
>>   3497 rolfv 99 0.0 0.0 0.0 0.0 0.0 0.0 1.0   0 169 200   0  
>> hpcc.amd64/1
>>   3501 rolfv 98 0.0 0.0 0.0 0.0 0.0 0.0 1.9   0 127 158   0  
>> hpcc.amd64/1
>>   3507 rolfv 98 0.0 0.0 0.0 0.0 0.0 0.0 2.0   0 244 200   0  
>> hpcc.amd64/1
>>   3509 rolfv 98 0.0 0.0 0.0 0.0 0.0 0.0 2.0   0 282 212   0  
>> hpcc.amd64/1
>>   3495 rolfv 97 0.0 0.0 0.0 0.0 0.0 0.0 3.2   0 237  98   0  
>> hpcc.amd64/1
>>
>> RUNNING OVER SELF,TCP
>>> mpirun -np 8 -mca btl self,tcp hpcc.amd64
>>
>>PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG  
>> PROCESS/NLWP
>>   4316 rolfv 93 6.9 0.0 0.0 0.0 0.0 0.0 0.2   5 346 .6M   0  
>> hpcc.amd64/1
>>   4328 rolfv 91 8.4 0.0 0.0 0.0 0.0 0.0 0.4   3  59 .15   0  
>> hpcc.amd64/1
>>   4324 rolfv 98 1.1 0.0 0.0 0.0 0.0 0.0 0.7   2 270 .1M   0  
>> hpcc.amd64/1
>>   4320 rolfv 88  12 0.0 0.0 0.0 0.0 0.0 0.8   4 244 .15   0  
>> hpcc.amd64/1
>>   4322 rolfv 94 5.1 0.0 0.0 0.0 0.0 0.0 1.3   2 150 .2M   0  
>> hpcc.amd64/1
>>   4318 rolfv 92 6.7 0.0 0.0 0.0 0.0 0.0 1.4   5 236 .9M   0  
>> hpcc.amd64/1
>>   4326 rolfv 93 5.3 0.0 0.0 0.0 0.0 0.0 1.7   7 117 .2M   0  
>> hpcc.amd64/1
>>   4314 rolfv 91 6.6 0.0 0.0 0.0 0.0 1.3 0.9  19 150 .10   0  
>> hpcc.amd64/1
>>
>> I also ran HPL over a larger cluster of 6 nodes, and noticed even  
>> higher
>> system times.
>>
>> And lastly, I ran a simple MPI test over a cluster of 64 nodes, 2  
>> procs
>> per node
>> using Sun HPC ClusterTools 6, and saw about a 50/50 split between  
>> user
>> and system time.
>>
>>   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG  
>> PROCESS/NLWP
>>  11525 rolfv 55  44 0.1 0.0 0.0 0.0 0.1 0.4  76 960 .3M   0
>> maxtrunc_ct6/1
>>  11526 rolfv 54  45 0.0 0.0 0.0 0.0 0.0 1.0   0 362 .4M   0
>> maxtrunc_ct6/1
>>
>> Is it possible that everything is working just as it should?
>>
>> Rolf
>>
>> Heywood, Todd wrote On 03/22/07 13:30,:
>>
>>> Ralph,
>>>
>>> Well, according to the FAQ, aggressive mode can be "forced" so I  
>>> did try
>>> setting OMPI_MCA_mpi_yield_when_idle=0 before runnin