date:20081224

Re: [OMPI users] mpiblast + openmpi + gridengine job faila to run

2008-12-24 Thread Sangamesh B

Thanks Reuti. That sorted out the problem.

Now mpiblast is able to run, but only on single node. i.e. mpiformatdb
-> 4 fragments, mpiblast - 4 processes. Since each node is having 4
cores, the job will run on a single node and works fine. With 8
processes, the job fails with following error message:

$ cat err.108.OMPI-Blast-Job
[0,1,7][btl_openib_component.c:1371:btl_openib_component_progress]
from compute-0-5.local to: compute-0-11.local error polling HP CQ with
status LOCAL LENGTH ERROR status number 1 for wr_id 12002616 opcode 42
[compute-0-11.local:09692] [0,0,0]-[0,1,2] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
[compute-0-11.local:09692] [0,0,0]-[0,1,4] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
4   0.674234Bailing out with signal 15
[compute-0-5.local:10032] MPI_ABORT invoked on rank 4 in communicator
MPI_COMM_WORLD with errorcode 0
5   1.324   Bailing out with signal 15
[compute-0-5.local:10033] MPI_ABORT invoked on rank 5 in communicator
MPI_COMM_WORLD with errorcode 0
6   1.32842 Bailing out with signal 15
[compute-0-5.local:10034] MPI_ABORT invoked on rank 6 in communicator
MPI_COMM_WORLD with errorcode 0
[compute-0-11.local:09692] [0,0,0]-[0,1,3] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
0   0.674561Bailing out with signal 15
[compute-0-11.local:09782] MPI_ABORT invoked on rank 0 in communicator
MPI_COMM_WORLD with errorcode 0
1   0.808846Bailing out with signal 15
[compute-0-11.local:09783] MPI_ABORT invoked on rank 1 in communicator
MPI_COMM_WORLD with errorcode 0
2   0.81484 Bailing out with signal 15
[compute-0-11.local:09784] MPI_ABORT invoked on rank 2 in communicator
MPI_COMM_WORLD with errorcode 0
3   1.32249 Bailing out with signal 15
[compute-0-11.local:09785] MPI_ABORT invoked on rank 3 in communicator
MPI_COMM_WORLD with errorcode 0

I think its problem with OpenMPI. Its not able to communicate with
processes on another node.
Please help me to get it working on multiple nodes.

Thanks,
Sangamesh


On Tue, Dec 23, 2008 at 4:45 PM, Reuti  wrote:
> Hi,
>
> Am 23.12.2008 um 12:03 schrieb Sangamesh B:
>
>> Hello,
>>
>>   I've compiled MPIBLAST-1.5.0-pio app on Rocks 4.3,Voltaire
>> infiniband based Linux cluster using Open MPI-1.2.8 + intel 10
>> compilers.
>>
>>  The job is not running. Let me explain the configs:
>>
>> SGE job script:
>>
>>  $ cat sge_submit.sh
>> #!/bin/bash
>>
>> #$ -N OMPI-Blast-Job
>>
>> #$ -S /bin/bash
>>
>> #$ -cwd
>>
>> #$ -e err.$JOB_ID.$JOB_NAME
>>
>> #$ -o out.$JOB_ID.$JOB_NAME
>>
>> #$ -pe orte 4
>>
>> /opt/openmpi_intel/1.2.8/bin/mpirun -np $NSLOTS
>> /opt/apps/mpiblast-150-pio_OMPI/bin/mpiblast -p blastp -d
>> Mtub_CDC1551_.faa -i 586_seq.fasta -o test.out
>>
>> The PE orte is:
>>
>> $ qconf -sp orte
>> pe_name   orte
>> slots 999
>> user_listsNONE
>> xuser_lists   NONE
>> start_proc_args   /bin/true
>> stop_proc_args/bin/true
>> allocation_rule   $fill_up
>> control_slavesFALSE
>> job_is_first_task TRUE
>
> you will need here:
>
> control_slavesTRUE
> job_is_first_task FALSE
>
> -- Reuti
>
>
>> urgency_slots min
>>
>> # /opt/openmpi_intel/1.2.8/bin/ompi_info | grep gridengine
>> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.8)
>> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.8)
>>
>> The SGE error and output files for the job are as follows:
>>
>> $ cat err.88.OMPI-Blast-Job
>> error: executing task of job 88 failed:
>> [compute-0-1.local:06151] ERROR: A daemon on node compute-0-1.local
>> failed to start as expected.
>> [compute-0-1.local:06151] ERROR: There may be more information available
>> from
>> [compute-0-1.local:06151] ERROR: the 'qstat -t' command on the Grid
>> Engine tasks.
>> [compute-0-1.local:06151] ERROR: If the problem persists, please restart
>> the
>> [compute-0-1.local:06151] ERROR: Grid Engine PE job
>> [compute-0-1.local:06151] ERROR: The daemon exited unexpectedly with
>> status 1.
>>
>> $ cat out.88.OMPI-Blast-Job
>>
>> There is nothing in output file.
>>
>> The qstat shows that job is running at some node. But on that node,
>> there is no mpiblast processes running  as seen by top command.
>>
>> The ps command:
>>
>> # ps -ef | grep mpiblast
>> locuz 4018  4017  0 16:25 ?00:00:00
>> /opt/openmpi_intel/1.2.8/bin/mpirun -np 4
>> /opt/apps/mpiblast-150-pio_OMPI/bin/mpiblast -p blastp -d
>> Mtub_CDC1551_.faa -i 586_seq.fasta -o test.out
>> root  4120  4022  0 16:27 pts/000:00:00 grep mpiblast
>>
>> shows this.
>>
>> The ibv_rc_pingpong tests work fine. The output of lsmod:
>>
>> # lsmod | grep ib
>> ib_sdp 57788  0
>> rdma_cm38292  3 rdma_ucm,rds,ib_sdp
>> ib_addr11400  1 rdma_cm
>> ib_local_sa14864  1 rdma_cm
>> ib_mthca  157396  2
>> ib_ipoib   83928  0
>> ib_umad20656  0
>> ib_ucm

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Pavel Shamis (Pasha)


Biagio Lucini wrote:

Hello,

I am new to this list, where I hope to find a solution for a problem 
that I have been having for quite a longtime.


I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster 
with Infiniband interconnects that I use and administer at the same 
time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and 
Intel. The queue manager is SGE 6.0u8. 
Do you use OpenMPI version that is included in OFED ? Did you was able 
to run basic OFED/OMPI tests/benchmarks between two nodes ?


Pasha

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Biagio Lucini


Pavel Shamis (Pasha) wrote:

Biagio Lucini wrote:

Hello,

I am new to this list, where I hope to find a solution for a problem
that I have been having for quite a longtime.

I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster
with Infiniband interconnects that I use and administer at the same
time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and
Intel. The queue manager is SGE 6.0u8.

Do you use OpenMPI version that is included in OFED ? Did you was able
to run basic OFED/OMPI tests/benchmarks between two nodes ?



Hi,

yes to both questions: the OMPI version is the one that comes with OFED 
(1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 (which is 
more than basic, as far as I can see) reports for the last test:


#---
# Benchmarking Barrier
# #processes = 6
#---
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 100022.9322.9522.94


for the openib,self btl (6 processes, all processes on different nodes)
and

#---
# Benchmarking Barrier
# #processes = 6
#---
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 1000   191.30   191.42   191.34

for the tcp,self btl (same test)

No anomalies for other tests (ping-pong, all-to-all etc.)

Thanks,
Biagio


--
=

Dr. Biagio Lucini   
Department of Physics, Swansea University
Singleton Park, SA2 8PP Swansea (UK)
Tel. +44 (0)1792 602284

=

[OMPI users] BTL question

2008-12-24 Thread Teige, Scott W


Greetings,

I have observed strange behavior with an application running with
OpenMPI 1.2.8, OFED 1.2. The application runs in two "modes", fast
and slow. The exectution time is either within one second of 108 sec.
or within one second of 67 sec. My cluster has 1 Gig ethernet and
DDR Infiniband so the byte transport layer is a prime suspect.

So, is there a way to determine (from my application code) which
BTL is really being used?

Thanks,
Scott

Re: [OMPI users] mpiblast + openmpi + gridengine job faila to run

2008-12-24 Thread Reuti


Hi,

Am 24.12.2008 um 07:55 schrieb Sangamesh B:


Thanks Reuti. That sorted out the problem.

Now mpiblast is able to run, but only on single node. i.e. mpiformatdb
-> 4 fragments, mpiblast - 4 processes. Since each node is having 4
cores, the job will run on a single node and works fine. With 8
processes, the job fails with following error message:


I would suggest to search in the SGE mailing list archive for  
"mpiblast" in the mail body - there are several entries about solving  
this issue, which might also apply to your case.


-- Reuti



$ cat err.108.OMPI-Blast-Job
[0,1,7][btl_openib_component.c:1371:btl_openib_component_progress]
from compute-0-5.local to: compute-0-11.local error polling HP CQ with
status LOCAL LENGTH ERROR status number 1 for wr_id 12002616 opcode 42
[compute-0-11.local:09692] [0,0,0]-[0,1,2] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
[compute-0-11.local:09692] [0,0,0]-[0,1,4] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
4   0.674234Bailing out with signal 15
[compute-0-5.local:10032] MPI_ABORT invoked on rank 4 in communicator
MPI_COMM_WORLD with errorcode 0
5   1.324   Bailing out with signal 15
[compute-0-5.local:10033] MPI_ABORT invoked on rank 5 in communicator
MPI_COMM_WORLD with errorcode 0
6   1.32842 Bailing out with signal 15
[compute-0-5.local:10034] MPI_ABORT invoked on rank 6 in communicator
MPI_COMM_WORLD with errorcode 0
[compute-0-11.local:09692] [0,0,0]-[0,1,3] mca_oob_tcp_msg_recv: readv
failed: Connection reset by peer (104)
0   0.674561Bailing out with signal 15
[compute-0-11.local:09782] MPI_ABORT invoked on rank 0 in communicator
MPI_COMM_WORLD with errorcode 0
1   0.808846Bailing out with signal 15
[compute-0-11.local:09783] MPI_ABORT invoked on rank 1 in communicator
MPI_COMM_WORLD with errorcode 0
2   0.81484 Bailing out with signal 15
[compute-0-11.local:09784] MPI_ABORT invoked on rank 2 in communicator
MPI_COMM_WORLD with errorcode 0
3   1.32249 Bailing out with signal 15
[compute-0-11.local:09785] MPI_ABORT invoked on rank 3 in communicator
MPI_COMM_WORLD with errorcode 0

I think its problem with OpenMPI. Its not able to communicate with
processes on another node.
Please help me to get it working on multiple nodes.

Thanks,
Sangamesh


On Tue, Dec 23, 2008 at 4:45 PM, Reuti   
wrote:

Hi,

Am 23.12.2008 um 12:03 schrieb Sangamesh B:


Hello,

  I've compiled MPIBLAST-1.5.0-pio app on Rocks 4.3,Voltaire
infiniband based Linux cluster using Open MPI-1.2.8 + intel 10
compilers.

 The job is not running. Let me explain the configs:

SGE job script:

 $ cat sge_submit.sh
#!/bin/bash

#$ -N OMPI-Blast-Job

#$ -S /bin/bash

#$ -cwd

#$ -e err.$JOB_ID.$JOB_NAME

#$ -o out.$JOB_ID.$JOB_NAME

#$ -pe orte 4

/opt/openmpi_intel/1.2.8/bin/mpirun -np $NSLOTS
/opt/apps/mpiblast-150-pio_OMPI/bin/mpiblast -p blastp -d
Mtub_CDC1551_.faa -i 586_seq.fasta -o test.out

The PE orte is:

$ qconf -sp orte
pe_name   orte
slots 999
user_listsNONE
xuser_lists   NONE
start_proc_args   /bin/true
stop_proc_args/bin/true
allocation_rule   $fill_up
control_slavesFALSE
job_is_first_task TRUE


you will need here:

control_slavesTRUE
job_is_first_task FALSE

-- Reuti



urgency_slots min

# /opt/openmpi_intel/1.2.8/bin/ompi_info | grep gridengine
MCA ras: gridengine (MCA v1.0, API v1.3,  
Component v1.2.8)
MCA pls: gridengine (MCA v1.0, API v1.3,  
Component v1.2.8)


The SGE error and output files for the job are as follows:

$ cat err.88.OMPI-Blast-Job
error: executing task of job 88 failed:
[compute-0-1.local:06151] ERROR: A daemon on node compute-0-1.local
failed to start as expected.
[compute-0-1.local:06151] ERROR: There may be more information  
available

from
[compute-0-1.local:06151] ERROR: the 'qstat -t' command on the Grid
Engine tasks.
[compute-0-1.local:06151] ERROR: If the problem persists, please  
restart

the
[compute-0-1.local:06151] ERROR: Grid Engine PE job
[compute-0-1.local:06151] ERROR: The daemon exited unexpectedly with
status 1.

$ cat out.88.OMPI-Blast-Job

There is nothing in output file.

The qstat shows that job is running at some node. But on that node,
there is no mpiblast processes running  as seen by top command.

The ps command:

# ps -ef | grep mpiblast
locuz 4018  4017  0 16:25 ?00:00:00
/opt/openmpi_intel/1.2.8/bin/mpirun -np 4
/opt/apps/mpiblast-150-pio_OMPI/bin/mpiblast -p blastp -d
Mtub_CDC1551_.faa -i 586_seq.fasta -o test.out
root  4120  4022  0 16:27 pts/000:00:00 grep mpiblast

shows this.

The ibv_rc_pingpong tests work fine. The output of lsmod:

# lsmod | grep ib
ib_sdp 57788  0
rdma_cm38292  3 rdma_ucm,rds,ib_sdp
ib_addr11400  1 rdma_cm
ib_local_sa14864  1 rdma_cm
ib_mthca  157396  2
ib_ipoib   83928  0
ib_umad20656  0
ib_ucm

Re: [OMPI users] BTL question

2008-12-24 Thread Pavel Shamis (Pasha)


Teige, Scott W wrote:

Greetings,

I have observed strange behavior with an application running with
OpenMPI 1.2.8, OFED 1.2. The application runs in two "modes", fast
and slow. The exectution time is either within one second of 108 sec.
or within one second of 67 sec. My cluster has 1 Gig ethernet and
DDR Infiniband so the byte transport layer is a prime suspect.

So, is there a way to determine (from my application code) which
BTL is really being used?

You may specify:
--mca btl openib,sm,self
And OpenMPI will use IB and shared memory for communication.
--mca btl tcp,sm,self
And OpenMPI will use TCP and shared memory for communication.

Thanks,
Pasha

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Pavel Shamis (Pasha)

If the basic test run the installation is ok. So what happens when you 
try to run your application ? What is command line ? What is the error 
message ? do you run the application on the same set of machines with 
the same command line as IMB ?

Pasha




yes to both questions: the OMPI version is the one that comes with 
OFED (1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 
(which is more than basic, as far as I can see) reports for the last 
test:


#---
# Benchmarking Barrier
# #processes = 6
#---
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 100022.9322.9522.94


for the openib,self btl (6 processes, all processes on different nodes)
and

#---
# Benchmarking Barrier
# #processes = 6
#---
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
 1000   191.30   191.42   191.34

for the tcp,self btl (same test)

No anomalies for other tests (ping-pong, all-to-all etc.)

Thanks,
Biagio

Re: [OMPI users] mpiblast + openmpi + gridengine job faila to run

2008-12-24 Thread Joe Landman


Reuti wrote:

Hi,

Am 24.12.2008 um 07:55 schrieb Sangamesh B:


Thanks Reuti. That sorted out the problem.

Now mpiblast is able to run, but only on single node. i.e. mpiformatdb
-> 4 fragments, mpiblast - 4 processes. Since each node is having 4
cores, the job will run on a single node and works fine. With 8
processes, the job fails with following error message:


First, there is an mpiblast list I'd suggest subscribing to.

Second, mpiformatdb, despite its name, is not an mpi code.  It doesn't 
run across multiple nodes, or with multiple threads.  See the mpiblast 
site (www.mpiblast.org) for more details and documentation on how to 
use/run it.


Joe



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: land...@scalableinformatics.com
web  : http://www.scalableinformatics.com
   http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Tim Mattox

For your runs with Open MPI over InfiniBand, try using openib,sm,self
for the BTL setting, so that shared memory communications are used
within a node.  It would give us another datapoint to help diagnose
the problem.  As for other things we would need to help diagnose the
problem, please follow the advice on this FAQ entry, and the help page:
http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot
http://www.open-mpi.org/community/help/

On Wed, Dec 24, 2008 at 5:55 AM, Biagio Lucini  wrote:
> Pavel Shamis (Pasha) wrote:
>>
>> Biagio Lucini wrote:
>>>
>>> Hello,
>>>
>>> I am new to this list, where I hope to find a solution for a problem
>>> that I have been having for quite a longtime.
>>>
>>> I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster
>>> with Infiniband interconnects that I use and administer at the same
>>> time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and
>>> Intel. The queue manager is SGE 6.0u8.
>>
>> Do you use OpenMPI version that is included in OFED ? Did you was able
>> to run basic OFED/OMPI tests/benchmarks between two nodes ?
>>
>
> Hi,
>
> yes to both questions: the OMPI version is the one that comes with OFED
> (1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 (which is
> more than basic, as far as I can see) reports for the last test:
>
> #---
> # Benchmarking Barrier
> # #processes = 6
> #---
>  #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> 100022.9322.9522.94
>
>
> for the openib,self btl (6 processes, all processes on different nodes)
> and
>
> #---
> # Benchmarking Barrier
> # #processes = 6
> #---
>  #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> 1000   191.30   191.42   191.34
>
> for the tcp,self btl (same test)
>
> No anomalies for other tests (ping-pong, all-to-all etc.)
>
> Thanks,
> Biagio
>
>
> --
> =
>
> Dr. Biagio Lucini
> Department of Physics, Swansea University
> Singleton Park, SA2 8PP Swansea (UK)
> Tel. +44 (0)1792 602284
>
> =
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/

Re: [OMPI users] sending message to the source(0) from other processors

2008-12-24 Thread Win Than Aung

thanks Eugene for your example, it helps me a lot.I bump into one more
problems
lets say I have the file content as follow
I have total of six files which all contain real and imaginary value.
"
1.001212 1.0012121  //0th
1.001212 1.0012121  //1st
1.001212 1.0012121  //2nd
1.001212 1.0012121  //3rd
1.001212 1.0012121  //4th
1.001212 1.0012121 //5th
1.001212 1.0012121 //6th
"
char send_buffer[1000];
i use "mpirun -np 6 a.out" which mean i let each processor get access to one
file
each processor will add "0th and 2nd"(even values) (those values will be
sent to root processor and save as file_even_add.dat" and also each
processor will add "1st and 3rd"(odd values) (those values will be sent to
root processor(here is 0) and saved as "file_odd_add.dat".

if(mpi_my_id == root)
{

}






On Tue, Dec 23, 2008 at 3:53 PM, Eugene Loh  wrote:

>  Win Than Aung wrote:
>
> thanks for your reply jeff
>  so i tried following
>
>
>
>  #include 
> #include 
>
> int main(int argc, char **argv) {
>  int np, me, sbuf = -1, rbuf = -2,mbuf=1000;
> int data[2];
>  MPI_Init(&argc,&argv);
>  MPI_Comm_size(MPI_COMM_WORLD,&np);
>  MPI_Comm_rank(MPI_COMM_WORLD,&me);
>  if ( np < 2 ) MPI_Abort(MPI_COMM_WORLD,-1);
>
>  if ( me == 1 ) MPI_Send(&sbuf,1,MPI_INT,0,344,MPI_COMM_WORLD);
> if(me==2) MPI_Send( &mbuf,1,MPI_INT,0,344,MPI_COMM_WORLD);
> if ( me == 0 ) {
>
> MPI_Recv(data,2,MPI_INT,MPI_ANY_SOURCE,344,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
>  }
>
>  MPI_Finalize();
>
>  return 0;
> }
>
> it can successfuly receive the one sent from processor 1(me==1) but it
> failed to receive the one sent from processor 2(me==2)
> mpirun -np 3 hello
>
> There is only one receive, so it receives only one message.  When you
> specify the element count for the receive, you're only specifying the size
> of the buffer into which the message will be received.  Only after the
> message has been received can you inquire how big the message actually was.
>
> Here is an example:
>
> % cat a.c
> #include 
> #include 
>
> int main(int argc, char **argv) {
>   int np, me, peer, value;
>
>   MPI_Init(&argc,&argv);
>   MPI_Comm_size(MPI_COMM_WORLD,&np);
>   MPI_Comm_rank(MPI_COMM_WORLD,&me);
>
>   value = me * me + 1;
>   if ( me == 0 ) {
> for ( peer = 0; peer < np; peer++ ) {
>   if ( peer != 0 )
> MPI_Recv(&value,1,MPI_INT,peer,343,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
>   printf("peer %d had value %d\n", peer, value);
> }
>   }
>   else MPI_Send(&value,1,MPI_INT,0,343,MPI_COMM_WORLD);
>
>   MPI_Finalize();
>
>   return 0;
> }
> % mpirun -np 3 a.out
> peer 0 had value 1
> peer 1 had value 2
> peer 2 had value 5
> %
>
> Alternatively,
>
> #include 
> #include 
>
> #define MAXNP 1024
> int main(int argc, char **argv) {
>   int np, me, peer, value, values[MAXNP];
>
>   MPI_Init(&argc,&argv);
>   MPI_Comm_size(MPI_COMM_WORLD,&np);
>   if ( np > MAXNP ) MPI_Abort(MPI_COMM_WORLD,-1);
>   MPI_Comm_rank(MPI_COMM_WORLD,&me);
>   value = me * me + 1;
>
>   MPI_Gather(&value, 1, MPI_INT,
>  values, 1, MPI_INT, 0, MPI_COMM_WORLD);
>
>   if ( me == 0 )
> for ( peer = 0; peer < np; peer++ )
>   printf("peer %d had value %d\n", peer, values[peer]);
>
>   MPI_Finalize();
>   return 0;
> }
> % mpirun -np 3 a.out
> peer 0 had value 1
> peer 1 had value 2
> peer 2 had value 5
> %
>
> Which is better?  Up to you.  The collective routines (like MPI_Gather) do
> offer MPI implementors (like people developing Open MPI) the opportunity to
> perform special optimizations (e.g., gather using a binary tree instead of
> having the root process perform so many receives).
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] sending message to the source(0) from other processors

2008-12-24 Thread Win Than Aung

thanks Eugene for your example, it helps me a lot.I bump into one more
problems
lets say I have the file content as follow
I have total of six files which all contain real and imaginary value.
"
1.001212 1.0012121  //0th
1.001212 1.0012121  //1st
1.001212 1.0012121  //2nd
1.001212 1.0012121  //3rd
1.001212 1.0012121  //4th
1.001212 1.0012121 //5th
1.001212 1.0012121 //6th
"
char send_buffer[1000];
i use "mpirun -np 6 a.out" which mean i let each processor get access to one
file
each processor will add "0th and 2nd"(even values) (those values will be
sent to root processor and save as file_even_add.dat" and also each
processor will add "1st and 3rd"(odd values) (those values will be sent to
root processor(here is 0) and saved as "file_odd_add.dat".

char recv_buffer[1000];
File* file_even_dat;
File* file_odd_dat;
if(mpi_my_id == root)
{
   filepteven = fopen("C:\\fileeven.dat");
   fileptodd = fopen("C:\\fileodd.dat");
 int peer =0;
for(peer =0;peer wrote:

>  Win Than Aung wrote:
>
> thanks for your reply jeff
>  so i tried following
>
>
>
>  #include 
> #include 
>
> int main(int argc, char **argv) {
>  int np, me, sbuf = -1, rbuf = -2,mbuf=1000;
> int data[2];
>  MPI_Init(&argc,&argv);
>  MPI_Comm_size(MPI_COMM_WORLD,&np);
>  MPI_Comm_rank(MPI_COMM_WORLD,&me);
>  if ( np < 2 ) MPI_Abort(MPI_COMM_WORLD,-1);
>
>  if ( me == 1 ) MPI_Send(&sbuf,1,MPI_INT,0,344,MPI_COMM_WORLD);
> if(me==2) MPI_Send( &mbuf,1,MPI_INT,0,344,MPI_COMM_WORLD);
> if ( me == 0 ) {
>
> MPI_Recv(data,2,MPI_INT,MPI_ANY_SOURCE,344,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
>  }
>
>  MPI_Finalize();
>
>  return 0;
> }
>
> it can successfuly receive the one sent from processor 1(me==1) but it
> failed to receive the one sent from processor 2(me==2)
> mpirun -np 3 hello
>
> There is only one receive, so it receives only one message.  When you
> specify the element count for the receive, you're only specifying the size
> of the buffer into which the message will be received.  Only after the
> message has been received can you inquire how big the message actually was.
>
> Here is an example:
>
> % cat a.c
> #include 
> #include 
>
> int main(int argc, char **argv) {
>   int np, me, peer, value;
>
>   MPI_Init(&argc,&argv);
>   MPI_Comm_size(MPI_COMM_WORLD,&np);
>   MPI_Comm_rank(MPI_COMM_WORLD,&me);
>
>   value = me * me + 1;
>   if ( me == 0 ) {
> for ( peer = 0; peer < np; peer++ ) {
>   if ( peer != 0 )
> MPI_Recv(&value,1,MPI_INT,peer,343,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
>   printf("peer %d had value %d\n", peer, value);
> }
>   }
>   else MPI_Send(&value,1,MPI_INT,0,343,MPI_COMM_WORLD);
>
>   MPI_Finalize();
>
>   return 0;
> }
> % mpirun -np 3 a.out
> peer 0 had value 1
> peer 1 had value 2
> peer 2 had value 5
> %
>
> Alternatively,
>
> #include 
> #include 
>
> #define MAXNP 1024
> int main(int argc, char **argv) {
>   int np, me, peer, value, values[MAXNP];
>
>   MPI_Init(&argc,&argv);
>   MPI_Comm_size(MPI_COMM_WORLD,&np);
>   if ( np > MAXNP ) MPI_Abort(MPI_COMM_WORLD,-1);
>   MPI_Comm_rank(MPI_COMM_WORLD,&me);
>   value = me * me + 1;
>
>   MPI_Gather(&value, 1, MPI_INT,
>  values, 1, MPI_INT, 0, MPI_COMM_WORLD);
>
>   if ( me == 0 )
> for ( peer = 0; peer < np; peer++ )
>   printf("peer %d had value %d\n", peer, values[peer]);
>
>   MPI_Finalize();
>   return 0;
> }
> % mpirun -np 3 a.out
> peer 0 had value 1
> peer 1 had value 2
> peer 2 had value 5
> %
>
> Which is better?  Up to you.  The collective routines (like MPI_Gather) do
> offer MPI implementors (like people developing Open MPI) the opportunity to
> perform special optimizations (e.g., gather using a binary tree instead of
> having the root process perform so many receives).
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] sending message to the source(0) from other processors

2008-12-24 Thread Win Than Aung

I got the solution. I just need to set the appropriate tag to send and
receive.sorry for asking
thanks
winthan

On Wed, Dec 24, 2008 at 10:36 PM, Win Than Aung  wrote:

> thanks Eugene for your example, it helps me a lot.I bump into one more
> problems
> lets say I have the file content as follow
> I have total of six files which all contain real and imaginary value.
> "
> 1.001212 1.0012121  //0th
> 1.001212 1.0012121  //1st
> 1.001212 1.0012121  //2nd
> 1.001212 1.0012121  //3rd
> 1.001212 1.0012121  //4th
> 1.001212 1.0012121 //5th
> 1.001212 1.0012121 //6th
> "
> char send_buffer[1000];
> i use "mpirun -np 6 a.out" which mean i let each processor get access to
> one file
> each processor will add "0th and 2nd"(even values) (those values will be
> sent to root processor and save as file_even_add.dat" and also each
> processor will add "1st and 3rd"(odd values) (those values will be sent to
> root processor(here is 0) and saved as "file_odd_add.dat".
>
> char recv_buffer[1000];
> File* file_even_dat;
> File* file_odd_dat;
> if(mpi_my_id == root)
> {
>filepteven = fopen("C:\\fileeven.dat");
>fileptodd = fopen("C:\\fileodd.dat");
>  int peer =0;
> for(peer =0;peer{
>   if(peer!=root)
>   {
>
> MPI_Recv(recv_buffer,MAX_STR_LEN,MPI_BYTE,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
>   }
>   fprintf(filepteven, "%s \n" ,recv_buffer);
>}
> }
>
> My question is how do i know which sentbuffer has even add values and which
> sentbuffer has odd add values? in which order did they get sent?
> thanks
> winthan
>
> On Tue, Dec 23, 2008 at 3:53 PM, Eugene Loh  wrote:
>
>>  Win Than Aung wrote:
>>
>> thanks for your reply jeff
>>  so i tried following
>>
>>
>>
>>  #include 
>> #include 
>>
>> int main(int argc, char **argv) {
>>  int np, me, sbuf = -1, rbuf = -2,mbuf=1000;
>> int data[2];
>>  MPI_Init(&argc,&argv);
>>  MPI_Comm_size(MPI_COMM_WORLD,&np);
>>  MPI_Comm_rank(MPI_COMM_WORLD,&me);
>>  if ( np < 2 ) MPI_Abort(MPI_COMM_WORLD,-1);
>>
>>  if ( me == 1 ) MPI_Send(&sbuf,1,MPI_INT,0,344,MPI_COMM_WORLD);
>> if(me==2) MPI_Send( &mbuf,1,MPI_INT,0,344,MPI_COMM_WORLD);
>> if ( me == 0 ) {
>>
>> MPI_Recv(data,2,MPI_INT,MPI_ANY_SOURCE,344,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
>>  }
>>
>>  MPI_Finalize();
>>
>>  return 0;
>> }
>>
>> it can successfuly receive the one sent from processor 1(me==1) but it
>> failed to receive the one sent from processor 2(me==2)
>> mpirun -np 3 hello
>>
>> There is only one receive, so it receives only one message.  When you
>> specify the element count for the receive, you're only specifying the size
>> of the buffer into which the message will be received.  Only after the
>> message has been received can you inquire how big the message actually was.
>>
>> Here is an example:
>>
>> % cat a.c
>> #include 
>> #include 
>>
>> int main(int argc, char **argv) {
>>   int np, me, peer, value;
>>
>>   MPI_Init(&argc,&argv);
>>   MPI_Comm_size(MPI_COMM_WORLD,&np);
>>   MPI_Comm_rank(MPI_COMM_WORLD,&me);
>>
>>   value = me * me + 1;
>>   if ( me == 0 ) {
>> for ( peer = 0; peer < np; peer++ ) {
>>   if ( peer != 0 )
>> MPI_Recv(&value,1,MPI_INT,peer,343,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
>>   printf("peer %d had value %d\n", peer, value);
>> }
>>   }
>>   else MPI_Send(&value,1,MPI_INT,0,343,MPI_COMM_WORLD);
>>
>>   MPI_Finalize();
>>
>>   return 0;
>> }
>> % mpirun -np 3 a.out
>> peer 0 had value 1
>> peer 1 had value 2
>> peer 2 had value 5
>> %
>>
>> Alternatively,
>>
>> #include 
>> #include 
>>
>> #define MAXNP 1024
>> int main(int argc, char **argv) {
>>   int np, me, peer, value, values[MAXNP];
>>
>>   MPI_Init(&argc,&argv);
>>   MPI_Comm_size(MPI_COMM_WORLD,&np);
>>   if ( np > MAXNP ) MPI_Abort(MPI_COMM_WORLD,-1);
>>   MPI_Comm_rank(MPI_COMM_WORLD,&me);
>>   value = me * me + 1;
>>
>>   MPI_Gather(&value, 1, MPI_INT,
>>  values, 1, MPI_INT, 0, MPI_COMM_WORLD);
>>
>>   if ( me == 0 )
>> for ( peer = 0; peer < np; peer++ )
>>   printf("peer %d had value %d\n", peer, values[peer]);
>>
>>   MPI_Finalize();
>>   return 0;
>> }
>> % mpirun -np 3 a.out
>> peer 0 had value 1
>> peer 1 had value 2
>> peer 2 had value 5
>> %
>>
>> Which is better?  Up to you.  The collective routines (like MPI_Gather) do
>> offer MPI implementors (like people developing Open MPI) the opportunity to
>> perform special optimizations (e.g., gather using a binary tree instead of
>> having the root process perform so many receives).
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>

Re: [OMPI users] mpiblast + openmpi + gridengine job faila to run

Re: [OMPI users] Problem with openmpi and infiniband

Re: [OMPI users] Problem with openmpi and infiniband

[OMPI users] BTL question

Re: [OMPI users] mpiblast + openmpi + gridengine job faila to run

Re: [OMPI users] BTL question

Re: [OMPI users] Problem with openmpi and infiniband

Re: [OMPI users] mpiblast + openmpi + gridengine job faila to run

Re: [OMPI users] Problem with openmpi and infiniband

Re: [OMPI users] sending message to the source(0) from other processors

Re: [OMPI users] sending message to the source(0) from other processors

Re: [OMPI users] sending message to the source(0) from other processors

12 matches

Site Navigation

Mail list logo

Footer information