Re: [OMPI users] Begginers question: why does this program hangs?

2008-03-18 Thread Andreas Schäfer
Hmm, strange. It doesn't hang for me and AFAICS it shouldn't hang at
all. I'm using 1.2.5. Which version of Open MPI are you using? 

Hanging with 100% CPU utilization often means that your processes are
caught in a busy wait. You could try to set mpi_yield_when_idle:

> gentryx@hex ~ $ cat .openmpi/mca-params.conf
> mpi_yield_when_idle=1

But I don't think this should be necessary.

HTH
-Andreas


On 21:35 Mon 17 Mar , Giovani Faccin wrote:
> Hi there!
> 
> I'm learning MPI,  and got really puzzled... Please take a look at this very 
> short code:
> 
> #include 
> #include "mpicxx.h"
> using namespace std;
> int main(int argc, char *argv[])
> {
> MPI::Init();
> 
> for (unsigned long t = 0; t < 1000; t++)
> {
> //If we are process 0:
> if ( MPI::COMM_WORLD.Get_rank() == 0 )
> {
> MPI::Status mpi_status;
> unsigned long d = 0;
> unsigned long d2 = 0;
> MPI::COMM_WORLD.Recv(&d, 1, MPI::UNSIGNED_LONG, MPI::ANY_SOURCE, 
> MPI::ANY_TAG, mpi_status );
> MPI::COMM_WORLD.Recv(&d2, 1, MPI::UNSIGNED_LONG, MPI::ANY_SOURCE, 
> MPI::ANY_TAG, mpi_status );
> cout << "Time = " << t << "; Node 0 received: " << d << " and " 
> << d2 << endl;
> }
> //Else:
> else
> {
> unsigned long  d = MPI::COMM_WORLD.Get_rank();
> MPI::COMM_WORLD.Send( &d, 1, MPI::UNSIGNED_LONG, 0, 0);
> };
> };
> MPI::Finalize();
> }
> 
> Ok, so what I'm trying to do is to make a gather operation using point to 
> point communication. In my real application instead of sending an unsigned 
> long I'd be calling an object's send and receive methods, which in turn would 
> call their inner object's similar methods and so on until all data is 
> syncronized. I'm using this loop because the number of objects to be sent to 
> process rank 0 varies depending on the sender.
> 
> When running this test with 3 processes on a dual core, oversubscribed node, 
> I get this output:
> (skipped previous output)
> Time = 5873; Node 0 received: 1 and 2
> Time = 5874; Node 0 received: 1 and 2
> Time = 5875; Node 0 received: 1 and 2
> Time = 5876; Node 0 received: 1 and 2
> 
> and then the application hangs, with processor usage at 100%. The exact time 
> when this condition occurs varies on each run, but it usually happens quite 
> fast.
> 
> What would I have to modify, in this simple example, so that the application 
> works as expected? Must I always use Gather, instead of point to point, to 
> make a syncronization like this?
> 
> Thank you very much!
> 
> Giovani
> 
> 
> 
> 
> 
> 
> 
>  __
> Fale com seus amigos  de graça com o novo Yahoo! Messenger 
> http://br.messenger.yahoo.com/ 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 

Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net


(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!


pgptmuyjgkEBE.pgp
Description: PGP signature


Re: [OMPI users] Begginers question: why does this program hangs?

2008-03-18 Thread Giovani Faccin
Hi Andreas, thanks for the reply!

I'm using openmpi-1.2.5. It was installed using my distro's (Gentoo) default 
package:

 sys-cluster/openmpi-1.2.5  USE="fortran ipv6 -debug -heterogeneous -nocxx -pbs 
-romio -smp -threads"

I've tried setting the mpi_yield_when_idle parameter as you asked. However, the 
program still hangs.

Just in case, the command line I'm using to call it is this:
/usr/bin/mpirun --hostfile mpi-config.txt --mca mpi_yield_when_idle 1 -np 3 
/home/gfaccin/desenvolvimento/Eclipse/mpiplay/Debug/mpiplay

where mpi-config.txt contains the following line:
localhost slots=1

Anything else I could try?

Thank you!

Giovani

Andreas Schäfer  escreveu: Hmm, strange. It doesn't hang for me 
and AFAICS it shouldn't hang at
all. I'm using 1.2.5. Which version of Open MPI are you using? 

Hanging with 100% CPU utilization often means that your processes are
caught in a busy wait. You could try to set mpi_yield_when_idle:

> gentryx@hex ~ $ cat .openmpi/mca-params.conf
> mpi_yield_when_idle=1

But I don't think this should be necessary.

HTH
-Andreas


On 21:35 Mon 17 Mar , Giovani Faccin wrote:
> Hi there!
> 
> I'm learning MPI,  and got really puzzled... Please take a look at this very 
> short code:
> 
> #include 
> #include "mpicxx.h"
> using namespace std;
> int main(int argc, char *argv[])
> {
> MPI::Init();
> 
> for (unsigned long t = 0; t < 1000; t++)
> {
> //If we are process 0:
> if ( MPI::COMM_WORLD.Get_rank() == 0 )
> {
> MPI::Status mpi_status;
> unsigned long d = 0;
> unsigned long d2 = 0;
> MPI::COMM_WORLD.Recv(&d, 1, MPI::UNSIGNED_LONG, MPI::ANY_SOURCE, 
> MPI::ANY_TAG, mpi_status );
> MPI::COMM_WORLD.Recv(&d2, 1, MPI::UNSIGNED_LONG, MPI::ANY_SOURCE, 
> MPI::ANY_TAG, mpi_status );
> cout << "Time = " << t << "; Node 0 received: " << d << " and " 
> << d2 << endl;
> }
> //Else:
> else
> {
> unsigned long  d = MPI::COMM_WORLD.Get_rank();
> MPI::COMM_WORLD.Send( &d, 1, MPI::UNSIGNED_LONG, 0, 0);
> };
> };
> MPI::Finalize();
> }
> 
> Ok, so what I'm trying to do is to make a gather operation using point to 
> point communication. In my real application instead of sending an unsigned 
> long I'd be calling an object's send and receive methods, which in turn would 
> call their inner object's similar methods and so on until all data is 
> syncronized. I'm using this loop because the number of objects to be sent to 
> process rank 0 varies depending on the sender.
> 
> When running this test with 3 processes on a dual core, oversubscribed node, 
> I get this output:
> (skipped previous output)
> Time = 5873; Node 0 received: 1 and 2
> Time = 5874; Node 0 received: 1 and 2
> Time = 5875; Node 0 received: 1 and 2
> Time = 5876; Node 0 received: 1 and 2
> 
> and then the application hangs, with processor usage at 100%. The exact time 
> when this condition occurs varies on each run, but it usually happens quite 
> fast.
> 
> What would I have to modify, in this simple example, so that the application 
> works as expected? Must I always use Gather, instead of point to point, to 
> make a syncronization like this?
> 
> Thank you very much!
> 
> Giovani
> 
> 
> 
> 
> 
> 
> 
>  __
> Fale com seus amigos  de graça com o novo Yahoo! Messenger 
> http://br.messenger.yahoo.com/ 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 

Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net


(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


-
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para armazenamento! 

Re: [OMPI users] SIGSEGV error.

2008-03-18 Thread Jeff Squyres

On Mar 17, 2008, at 10:16 PM, balaji srinivas wrote:


I am new to MPI. The outline of my code is

if(r==0)
function1()
else if(r==1)
function2()

where r is the rank and functions are included in the .h files.  
There are no compilation errors. I get the SIGSEGV error while  
running.

Pls help. how to solve this?



From your description, it is impossible to tell if this is an MPI  
issue or not.  You should probably use standard debugging techniques,  
such as using a debugger, examining core files, etc.  See http://www.open-mpi.org/faq/?category=debugging 
 if you need some suggestions for debugging in parallel.



2) how to find the execution time of a mpi program. in C we have
clock_t start=clock() at the beginning and

((double)clock() - start) / CLOCKS_PER_SEC) at the end.




I don't quite understand your question -- is your use of clock()  
reporting incorrect wall clock times?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] SIGSEGV error.

2008-03-18 Thread Giovani Faccin
Hey Balaji

I'm new at it too, but might be able to help you a bit.

A sigsegv error occurs usually when you try to access something in memory 
that's not actually there. Like using a pointer that points to nothing. In my 
short experience with MPI so far, I got this kind of message when I made 
something wrong with the functions, like for example sending a buffer and, when 
telling MPI it's size, giving a wrong value. Make sure there's nothing like 
that on your code.

For the time thing, I think that what you want is the MPI_WTime() function. 
Check this out:
https://computing.llnl.gov/tutorials/mpi/man/MPI_Wtime.txt

Best, 

Giovani

balaji srinivas  escreveu:
 hi all,
  I am new to MPI. The outline of my code is

 if(r==0)
 function1()
 else if(r==1)
 function2()

 where r is the rank and functions are included in the .h files. There are no 
compilation errors. I get the SIGSEGV error while running.
 Pls help. how to solve this?

 2) how to find the execution time of a mpi program. in C we have
 clock_t start=clock() at the beginning and

 ((double)clock() - start) / CLOCKS_PER_SEC) at the end.

 Thanks in advance.

 regards,
 balaji. 


 ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


-
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para armazenamento! 

Re: [OMPI users] Begginers question: why does this program hangs?

2008-03-18 Thread Jeff Squyres

Two notes for you:

1. Your program does necessarily guarantee what you might expect:  
since you use ANY_SOURCE/ANY_TAG in both the receives, you might  
actually get two receives from the same sender in a given iteration.   
The fact that you're effectively using yield_when_idle (which OMPI  
will automatically enable when you tell it "slots=1" but you run with - 
np 3) means that you probably *won't* have this happen (because every  
MPI process will yield on every iteration, effectively keeping all 3  
in lock step), but it still *can* happen (and did frequently in my  
tests).


2. The problem you're seeing is an optimization called "early  
completion" where, for latency ping-pong optimizations, Open MPI may  
indicate that a send has "completed" before the message is actually  
placed on the network (shared memory, in your case).  This can be a  
nice performance boost for applications that both a) dip into the MPI  
layer frequently and b) synchronize at some point.  Your application  
is not necessarily doing this in the final iterations; it may reach  
MPI_FINALIZE while there's still a pile of messages that have been  
queued for delivery before they are actually progressed out the  
network to the receiver.  In our upcoming 1.2.6 release, there is a  
run-time parameter to disable this early completion behavior (i.e.,  
never signal completion of a send before the data is actually  
transmitted out on the network).  You can try the 1.2.6rc2 tarball:


http://www.open-mpi.org/software/ompi/v1.2/

And use the following MCA parameter:

mpirun --mca pml_ob1_use_early_completion 0 ...

See if that works for you.


On Mar 18, 2008, at 7:11 AM, Giovani Faccin wrote:


Hi Andreas, thanks for the reply!

I'm using openmpi-1.2.5. It was installed using my distro's (Gentoo)  
default package:


 sys-cluster/openmpi-1.2.5  USE="fortran ipv6 -debug -heterogeneous - 
nocxx -pbs -romio -smp -threads"


I've tried setting the mpi_yield_when_idle parameter as you asked.  
However, the program still hangs.


Just in case, the command line I'm using to call it is this:
/usr/bin/mpirun --hostfile mpi-config.txt --mca mpi_yield_when_idle  
1 -np 3 /home/gfaccin/desenvolvimento/Eclipse/mpiplay/Debug/mpiplay


where mpi-config.txt contains the following line:
localhost slots=1

Anything else I could try?

Thank you!

Giovani

Andreas Schäfer  escreveu: Hmm, strange. It doesn't  
hang for me and AFAICS it shouldn't hang at

all. I'm using 1.2.5. Which version of Open MPI are you using?

Hanging with 100% CPU utilization often means that your processes are
caught in a busy wait. You could try to set mpi_yield_when_idle:

> gentryx@hex ~ $ cat .openmpi/mca-params.conf
> mpi_yield_when_idle=1

But I don't think this should be necessary.

HTH
-Andreas


On 21:35 Mon 17 Mar , Giovani Faccin wrote:
> Hi there!
>
> I'm learning MPI, and got really puzzled... Please take a look at  
this very short code:

>
> #include
> #include "mpicxx.h"
> using namespace std;
> int main(int argc, char *argv[])
> {
> MPI::Init();
>
> for (unsigned long t = 0; t < 1000; t++)
> {
> //If we are process 0:
> if ( MPI::COMM_WORLD.Get_rank() == 0 )
> {
> MPI::Status mpi_status;
> unsigned long d = 0;
> unsigned long d2 = 0;
> MPI::COMM_WORLD.Recv(&d, 1, MPI::UNSIGNED_LONG, MPI::ANY_SOURCE,  
MPI::ANY_TAG, mpi_status );
> MPI::COMM_WORLD.Recv(&d2, 1, MPI::UNSIGNED_LONG, MPI::ANY_SOURCE,  
MPI::ANY_TAG, mpi_status );
> cout << "Time = " << t << "; Node 0 received: " << d << " and " <<  
d2 << endl;

> }
> //Else:
> else
> {
> unsigned long d = MPI::COMM_WORLD.Get_rank();
> MPI::COMM_WORLD.Send( &d, 1, MPI::UNSIGNED_LONG, 0, 0);
> };
> };
> MPI::Finalize();
> }
>
> Ok, so what I'm trying to do is to make a gather operation using  
point to point communication. In my real application instead of  
sending an unsigned long I'd be calling an object's send and receive  
methods, which in turn would call their inner object's similar  
methods and so on until all data is syncronized. I'm using this loop  
because the number of objects to be sent to process rank 0 varies  
depending on the sender.

>
> When running this test with 3 processes on a dual core,  
oversubscribed node, I get this output:

> (skipped previous output)
> Time = 5873; Node 0 received: 1 and 2
> Time = 5874; Node 0 received: 1 and 2
> Time = 5875; Node 0 received: 1 and 2
> Time = 5876; Node 0 received: 1 and 2
>
> and then the application hangs, with processor usage at 100%. The  
exact time when this condition occurs varies on each run, but it  
usually happens quite fast.

>
> What would I have to modify, in this simple example, so that the  
application works as expected? Must I always use Gather, instead of  
point to point, to make a syncronization like this?

>
> Thank you very much!
>
> Giovani
>
>
>
>
>
>
>
> __
> Fale com seus amigos de graça com o novo Yahoo! Messenger
> http://br.messenger.yahoo.com/
> __

Re: [OMPI users] Begginers question: why does this program hangs?

2008-03-18 Thread Andreas Schäfer
OK, this is strange. I've rerun the test and got it to block,
too. Although repeated tests show that those are rare (sometimes the
program runs smoothly without blocking, but in about 30% of the cases
it hangs just like you said).

On 08:11 Tue 18 Mar , Giovani Faccin wrote:
> I'm using openmpi-1.2.5. It was installed using my distro's (Gentoo) default 
> package:
> 
>  sys-cluster/openmpi-1.2.5  USE="fortran ipv6 -debug -heterogeneous -nocxx 
> -pbs -romio -smp -threads"

Just like me.

I've attached gdb to all three processes. On rank 0 I get the
following backtrace:

(gdb) bt
#0  0x2ada849b3f16 in mca_btl_sm_component_progress ()
   from /usr/lib64/openmpi/mca_btl_sm.so
#1  0x2ada845a71da in mca_bml_r2_progress () from 
/usr/lib64/openmpi/mca_bml_r2.so
#2  0x2ada7e6fbbea in opal_progress () from /usr/lib64/libopen-pal.so.0
#3  0x2ada8439a9a5 in mca_pml_ob1_recv () from 
/usr/lib64/openmpi/mca_pml_ob1.so
#4  0x2ada7e2570a8 in PMPI_Recv () from /usr/lib64/libmpi.so.0
#5  0x0040c9ae in MPI::Comm::Recv ()
#6  0x00409607 in main ()

On rank 1:

(gdb) bt
#0  0x2baa6869bcc0 in mca_btl_sm_send () from 
/usr/lib64/openmpi/mca_btl_sm.so
#1  0x2baa6808a93d in mca_pml_ob1_send_request_start_copy ()
   from /usr/lib64/openmpi/mca_pml_ob1.so
#2  0x2baa680855f6 in mca_pml_ob1_send () from 
/usr/lib64/openmpi/mca_pml_ob1.so
#3  0x2baa61f43182 in PMPI_Send () from /usr/lib64/libmpi.so.0
#4  0x0040ca04 in MPI::Comm::Send ()
#5  0x00409700 in main ()

On rank 2:

(gdb) bt
#0  0x2b933d555ac7 in sched_yield () from /lib/libc.so.6
#1  0x2b9341efe775 in mca_pml_ob1_send () from 
/usr/lib64/openmpi/mca_pml_ob1.so
#2  0x2b933bdbc182 in PMPI_Send () from /usr/lib64/libmpi.so.0
#3  0x0040ca04 in MPI::Comm::Send ()
#4  0x00409700 in main ()

Anyone got a clue?


-- 

Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net


(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!


pgpkeiCoamlp_.pgp
Description: PGP signature


Re: [OMPI users] Begginers question: why does this program

2008-03-18 Thread Mark Kosmowski
Giovani:

Which compiler are you using?

Also, you didn't mention this, but does "mpirun hostname" give the
expected response?  I (also new) had a hang similar to what you are
describing due to ompi getting confused as to which of two network
interfaces to use - "mpirun hostname" would hang when started on
certain nodes.  This problem was resolved by telling ompi which
network interface to use (I forget the option needed to do this off
the top of my head, but it is in the FAQ somewhere).

Good luck,

Mark


Re: [OMPI users] Begginers question: why does this program

2008-03-18 Thread Giovani Faccin
Hi Mark

Compiler and flags:

sys-devel/gcc-4.1.2  USE="doc* fortran gtk mudflap nls (-altivec) -bootstrap 
-build -d -gcj (-hardened) -ip28 -ip32r10k -libffi% (-multilib) -multislot 
(-n32) (-n64) -nocxx -objc -objc++ -objc-gc -test -vanilla"

Network stuff:

sonja gfaccin # ifconfig
loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:33166 errors:0 dropped:0 overruns:0 frame:0
  TX packets:33166 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0 
  RX bytes:9846970 (9.3 Mb)  TX bytes:9846970 (9.3 Mb)

wlan0 Link encap:Ethernet  HWaddr 00:1C:BF:24:24:91  
  inet addr:192.168.1.50  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::21c:bfff:fe24:2491/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:5944 errors:0 dropped:0 overruns:0 frame:0
  TX packets:6343 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:3058968 (2.9 Mb)  TX bytes:1713598 (1.6 Mb)

wmaster0  Link encap:UNSPEC  HWaddr 
00-1C-BF-24-24-91-60-00-00-00-00-00-00-00-00  
-00  
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

I have 2 cards in my laptop, one is an ethernet one that's not enabled (no 
kernel modules loaded). The other one is the wireless card, which is enabled. 
Those 2 interfaces appear because the driver creates them. The real one is 
wlan0.

I'll try to find in the faq where is this flag to specify the card, just in 
case MPI might be trying to use wmaster0. Let's see if it works.

Thanks!

Giovani


Mark Kosmowski  escreveu: Giovani:

Which compiler are you using?

Also, you didn't mention this, but does "mpirun hostname" give the
expected response?  I (also new) had a hang similar to what you are
describing due to ompi getting confused as to which of two network
interfaces to use - "mpirun hostname" would hang when started on
certain nodes.  This problem was resolved by telling ompi which
network interface to use (I forget the option needed to do this off
the top of my head, but it is in the FAQ somewhere).

Good luck,

Mark
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



-
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para armazenamento! 

Re: [OMPI users] RPM build errors when creating multiple rpms

2008-03-18 Thread Jeff Squyres

On Mar 17, 2008, at 2:34 PM, Christopher Irving wrote:

Well that fixed the errors for the case prefix=/usr but after  
looking at

the spec file I suspected it would cause a problem if you used the
install_in_opt option.  So I tried it and got the following errors:

   RPM build errors:
   Installed (but unpackaged) file(s) found:
  /opt/openmpi/1.2.5/etc/openmpi-default-hostfile
  /opt/openmpi/1.2.5/etc/openmpi-mca-params.conf
  /opt/openmpi/1.2.5/etc/openmpi-totalview.tcl

I just don't think the inclusion of  _sysconfdir needs to be wrapped  
in

a condition statement.  It needs to be included in either case,
installing to /opt or to /usr, and will already be correctly defined  
for

both.  So in the new spec file if you get rid of line 651 - %if !
%{sysconfdir_in_prefix} - and the closing endif on 653 it will work  
for

both cases.


Hmm.  I'm having problems getting that to fail (even with a 1.2.5  
tarball and install_in_opt=1).  That %if is there because I was  
running into errors when rpm would complain that some files were  
listed twice (e.g., under %{prefix} and %{sysconfdir}).


I must not be understanding exactly what you're running if I can't  
reproduce the problem.  Can you specify your exact rpmbuild command?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Begginers question: why does this program

2008-03-18 Thread Giovani Faccin
Yep, setting the card manually did not solve it.

I'm compiling the pre-release version now. Let's see if it works.

Giovani

Giovani Faccin  escreveu: Hi Mark

Compiler and flags:

sys-devel/gcc-4.1.2  USE="doc* fortran gtk mudflap nls (-altivec) -bootstrap 
-build -d -gcj (-hardened) -ip28 -ip32r10k -libffi% (-multilib) -multislot 
(-n32) (-n64) -nocxx -objc -objc++ -objc-gc -test -vanilla"

Network stuff:

sonja gfaccin # ifconfig
loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:33166 errors:0 dropped:0 overruns:0 frame:0
  TX packets:33166 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:0 
  RX bytes:9846970 (9.3 Mb)  TX bytes:9846970 (9.3 Mb)

wlan0 Link encap:Ethernet  HWaddr 00:1C:BF:24:24:91  
  inet addr:192.168.1.50  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::21c:bfff:fe24:2491/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:5944 errors:0 dropped:0 overruns:0 frame:0
  TX packets:6343 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:3058968  (2.9 Mb)  TX bytes:1713598 (1.6 Mb)

wmaster0  Link encap:UNSPEC  HWaddr 
00-1C-BF-24-24-91-60-00-00-00-00-00-00-00-00  
-00  
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

I have 2 cards in my laptop, one is an ethernet one that's not enabled (no 
kernel modules loaded). The other one is the wireless card, which is  enabled. 
Those 2 interfaces appear because the driver creates them. The real one is 
wlan0.

I'll try to find in the faq where is this flag to specify the card, just in 
case MPI might be trying to use wmaster0. Let's see if it works.

Thanks!

Giovani


Mark Kosmowski  escreveu: Giovani:

Which compiler are you using?

Also, you didn't mention this, but does "mpirun hostname" give the
expected response?  I (also new) had a hang similar to what you are
describing due to ompi getting confused as to which of two network
interfaces to use - "mpirun hostname" would hang when started on
certain nodes.  This problem was resolved by telling ompi which
network interface to use (I forget the option needed to do this off
the top of my head, but it is in the FAQ  somewhere).

Good luck,

Mark
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



-
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para armazenamento! 
 ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


-
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para armazenamento! 

Re: [OMPI users] Begginers question: why does this program

2008-03-18 Thread Jeff Squyres

On Mar 18, 2008, at 8:38 AM, Giovani Faccin wrote:


Yep, setting the card manually did not solve it.


I would not think that it would.  Generally, if OMPI can't figure out  
your network configuration, it'll be an "all or nothing" kind of  
failure.  The fact that your program runs for a long while and then  
eventually stalls indicates that OMPI was likely able to figure out  
your network config ok.



I'm compiling the pre-release version now. Let's see if it works.


Good.

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Begginers question: why does this program

2008-03-18 Thread Giovani Faccin
Ok, I uninstalled the previous version. Then downloaded the pre-release 
version. Unpacked it, configure, make, make install

When running MPICC I get this:
mpiCC: error while loading shared libraries: libopen-pal.so.0: cannot open 
shared object file: No such file or directory

$whereis libopen-pal
libopen-pal: /usr/local/lib/libopen-pal.so /usr/local/lib/libopen-pal.la

So the library exists. How can I make mpiCC know it's location?

Thanks!

Giovani



Giovani Faccin  escreveu: Yep, setting the card 
manually did not solve it.

I'm compiling the pre-release version now. Let's see if it works.

Giovani

Giovani Faccin  escreveu: Hi Mark

Compiler and flags:

sys-devel/gcc-4.1.2  USE="doc* fortran gtk mudflap nls (-altivec) -bootstrap 
-build -d -gcj (-hardened) -ip28 -ip32r10k -libffi% (-multilib) -multislot 
(-n32) (-n64) -nocxx -objc -objc++ -objc-gc -test -vanilla"

Network stuff:

sonja gfaccin # ifconfig
loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK  RUNNING  MTU:16436  Metric:1
  RX packets:33166 errors:0 dropped:0 overruns:0 frame:0
  TX packets:33166 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:0 
  RX bytes:9846970 (9.3 Mb)  TX bytes:9846970 (9.3 Mb)

wlan0 Link encap:Ethernet  HWaddr 00:1C:BF:24:24:91  
  inet addr:192.168.1.50  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::21c:bfff:fe24:2491/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX  packets:5944 errors:0 dropped:0 overruns:0 frame:0
  TX packets:6343 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:3058968  (2.9 Mb)  TX bytes:1713598 (1.6 Mb)

wmaster0  Link encap:UNSPEC  HWaddr 
00-1C-BF-24-24-91-60-00-00-00-00-00-00-00-00  
-00  
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0  carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

I have 2 cards in my laptop, one is an ethernet one that's not enabled (no 
kernel modules loaded). The other one is the wireless card, which is  enabled. 
Those 2 interfaces appear because the driver creates them. The real one is 
wlan0.

I'll try to find in the faq where is this flag to specify the card, just in 
case MPI might be trying to use wmaster0. Let's see if it works.

Thanks!

Giovani


Mark Kosmowski  escreveu: Giovani:

Which compiler are you using?

Also, you didn't mention this, but does "mpirun hostname" give the
expected response?  I (also new) had a hang  similar to what you are
describing due to ompi getting confused as to which of two network
interfaces to use - "mpirun hostname" would hang when started on
certain nodes.  This problem was resolved by telling ompi which
network interface to use (I forget the option needed to do this off
the top of my head, but it is in the FAQ  somewhere).

Good luck,

Mark
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



-
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para armazenamento! 
 ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


-
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para armazenamento! 
 ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


-
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para armazenamento! 

Re: [OMPI users] Begginers question: why does this program hangs?

2008-03-18 Thread George Bosilca
Jeff hinted the real problem in his email. Even if the program use the  
correct MPI functions, it is not 100% correct. It might pass in some  
situations, but can lead to fake "deadlocks" in others. The problem  
come from the flow control. If the messages are small (which is the  
case in the test example), Open MPI will send them eagerly. Without a  
flow control, these messages will be buffered by the receiver, which  
will exhaust the memory on the receiver. Once this happens, some of  
the messages may get dropped, but the most visible result, is that the  
progress will happens very (VERY) slowly.


Adding a MPI_Barrier every 100 iterations will solve the problem.

  george.

PS: A very similar problem was discussed on the mailing list few days  
ago. Please read the thread to see a more detailed explanation, as  
well as another solution to solve it.


On Mar 18, 2008, at 7:48 AM, Andreas Schäfer wrote:


OK, this is strange. I've rerun the test and got it to block,
too. Although repeated tests show that those are rare (sometimes the
program runs smoothly without blocking, but in about 30% of the cases
it hangs just like you said).

On 08:11 Tue 18 Mar , Giovani Faccin wrote:
I'm using openmpi-1.2.5. It was installed using my distro's  
(Gentoo) default package:


sys-cluster/openmpi-1.2.5  USE="fortran ipv6 -debug -heterogeneous - 
nocxx -pbs -romio -smp -threads"


Just like me.

I've attached gdb to all three processes. On rank 0 I get the
following backtrace:

(gdb) bt
#0  0x2ada849b3f16 in mca_btl_sm_component_progress ()
  from /usr/lib64/openmpi/mca_btl_sm.so
#1  0x2ada845a71da in mca_bml_r2_progress () from /usr/lib64/ 
openmpi/mca_bml_r2.so
#2  0x2ada7e6fbbea in opal_progress () from /usr/lib64/libopen- 
pal.so.0
#3  0x2ada8439a9a5 in mca_pml_ob1_recv () from /usr/lib64/ 
openmpi/mca_pml_ob1.so

#4  0x2ada7e2570a8 in PMPI_Recv () from /usr/lib64/libmpi.so.0
#5  0x0040c9ae in MPI::Comm::Recv ()
#6  0x00409607 in main ()

On rank 1:

(gdb) bt
#0  0x2baa6869bcc0 in mca_btl_sm_send () from /usr/lib64/openmpi/ 
mca_btl_sm.so

#1  0x2baa6808a93d in mca_pml_ob1_send_request_start_copy ()
  from /usr/lib64/openmpi/mca_pml_ob1.so
#2  0x2baa680855f6 in mca_pml_ob1_send () from /usr/lib64/ 
openmpi/mca_pml_ob1.so

#3  0x2baa61f43182 in PMPI_Send () from /usr/lib64/libmpi.so.0
#4  0x0040ca04 in MPI::Comm::Send ()
#5  0x00409700 in main ()

On rank 2:

(gdb) bt
#0  0x2b933d555ac7 in sched_yield () from /lib/libc.so.6
#1  0x2b9341efe775 in mca_pml_ob1_send () from /usr/lib64/ 
openmpi/mca_pml_ob1.so

#2  0x2b933bdbc182 in PMPI_Send () from /usr/lib64/libmpi.so.0
#3  0x0040ca04 in MPI::Comm::Send ()
#4  0x00409700 in main ()

Anyone got a clue?


--

Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net


(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] Begginers question: why does this program

2008-03-18 Thread George Bosilca
As indicated in the FAQ you should add the directory where Open MPI  
was installed to the LD_LIBRARY_PATH.


  george.

On Mar 18, 2008, at 8:57 AM, Giovani Faccin wrote:

Ok, I uninstalled the previous version. Then downloaded the pre- 
release version. Unpacked it, configure, make, make install


When running MPICC I get this:
mpiCC: error while loading shared libraries: libopen-pal.so.0:  
cannot open shared object file: No such file or directory


$whereis libopen-pal
libopen-pal: /usr/local/lib/libopen-pal.so /usr/local/lib/libopen- 
pal.la


So the library exists. How can I make mpiCC know it's location?

Thanks!

Giovani



Giovani Faccin  escreveu: Yep, setting  
the card manually did not solve it.


I'm compiling the pre-release version now. Let's see if it works.

Giovani

Giovani Faccin  escreveu: Hi Mark

Compiler and flags:

sys-devel/gcc-4.1.2  USE="doc* fortran gtk mudflap nls (-altivec) - 
bootstrap -build -d -gcj (-hardened) -ip28 -ip32r10k -libffi% (- 
multilib) -multislot (-n32) (-n64) -nocxx -objc -objc++ -objc-gc - 
test -vanilla"


Network stuff:

sonja gfaccin # ifconfig
loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:33166 errors:0 dropped:0 overruns:0 frame:0
  TX packets:33166 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:9846970 (9.3 Mb)  TX bytes:9846970 (9.3 Mb)

wlan0 Link encap:Ethernet  HWaddr 00:1C:BF:24:24:91
  inet addr:192.168.1.50  Bcast:192.168.0.255  Mask: 
255.255.255.0

  inet6 addr: fe80::21c:bfff:fe24:2491/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:5944 errors:0 dropped:0 overruns:0 frame:0
  TX packets:6343 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:3058968 (2.9 Mb)  TX bytes:1713598 (1.6 Mb)

wmaster0  Link encap:UNSPEC  HWaddr 00-1C- 
BF-24-24-91-60-00-00-00-00-00-00-00-00

-00
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

I have 2 cards in my laptop, one is an ethernet one that's not  
enabled (no kernel modules loaded). The other one is the wireless  
card, which is enabled. Those 2 interfaces appear because the driver  
creates them. The real one is wlan0.


I'll try to find in the faq where is this flag to specify the card,  
just in case MPI might be trying to use wmaster0. Let's see if it  
works.


Thanks!

Giovani


Mark Kosmowski  escreveu: Giovani:

Which compiler are you using?

Also, you didn't mention this, but does "mpirun hostname" give the
expected response? I (also new) had a hang similar to what you are
describing due to ompi getting confused as to which of two network
interfaces to use - "mpirun hostname" would hang when started on
certain nodes. This problem was resolved by telling ompi which
network interface to use (I forget the option needed to do this off
the top of my head, but it is in the FAQ somewhere).

Good luck,

Mark
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Abra sua conta no Yahoo! Mail, o único sem limite de espaço para  
armazenamento! ___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para  
armazenamento! ___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Abra sua conta no Yahoo! Mail, o único sem limite de espaço para  
armazenamento! ___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] Begginers question: why does this program hangs?

2008-03-18 Thread Jeff Squyres

On Mar 18, 2008, at 10:32 AM, George Bosilca wrote:

Jeff hinted the real problem in his email. Even if the program use  
the correct MPI functions, it is not 100% correct.


I think we disagree here -- the sample program is correct according to  
the MPI spec.  It's an implementation artifact that makes it deadlock.


The upcoming v1.3 series doesn't suffer from this issue; we revamped  
our transport system to distinguish between early and normal  
completions.  The pml_ob1_use_eager_completion MCA param was added to  
v1.2.6 to allow correct MPI apps to avoid this optimization -- a  
proper fix is coming in the v1.3 series.


It might pass in some situations, but can lead to fake "deadlocks"  
in others. The problem come from the flow control. If the messages  
are small (which is the case in the test example), Open MPI will  
send them eagerly. Without a flow control, these messages will be  
buffered by the receiver, which will exhaust the memory on the  
receiver. Once this happens, some of the messages may get dropped,  
but the most visible result, is that the progress will happens very  
(VERY) slowly.


Your text implies that we can actually *drop* (and retransmit)  
messages in the sm btl.  That doesn't sound right to me -- is that  
what you meant?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Begginers question: why does this program hangs?

2008-03-18 Thread Andreas Schäfer
On 10:51 Tue 18 Mar , Jeff Squyres wrote:
> The upcoming v1.3 series doesn't suffer from this issue; we revamped  
> our transport system to distinguish between early and normal  
> completions.  The pml_ob1_use_eager_completion MCA param was added to  
> v1.2.6 to allow correct MPI apps to avoid this optimization -- a  
> proper fix is coming in the v1.3 series.

Yo, I've just tried it with the current SVN and couldn't reproduce the
deadlock. Nice!

Cheers
-Andreas


-- 

Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net


(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!


pgpoP042yYVZU.pgp
Description: PGP signature


Re: [OMPI users] RPM build errors when creating multiple rpms

2008-03-18 Thread Christopher Irving
On Tue, 2008-03-18 at 08:32 -0400, Jeff Squyres wrote:
> On Mar 17, 2008, at 2:34 PM, Christopher Irving wrote:
> 
> > Well that fixed the errors for the case prefix=/usr but after  
> > looking at
> > the spec file I suspected it would cause a problem if you used the
> > install_in_opt option.  So I tried it and got the following errors:
> >
> >RPM build errors:
> >Installed (but unpackaged) file(s) found:
> >   /opt/openmpi/1.2.5/etc/openmpi-default-hostfile
> >   /opt/openmpi/1.2.5/etc/openmpi-mca-params.conf
> >   /opt/openmpi/1.2.5/etc/openmpi-totalview.tcl
> >
> > I just don't think the inclusion of  _sysconfdir needs to be wrapped  
> > in
> > a condition statement.  It needs to be included in either case,
> > installing to /opt or to /usr, and will already be correctly defined  
> > for
> > both.  So in the new spec file if you get rid of line 651 - %if !
> > %{sysconfdir_in_prefix} - and the closing endif on 653 it will work  
> > for
> > both cases.
> 
> Hmm.  I'm having problems getting that to fail (even with a 1.2.5  
> tarball and install_in_opt=1).  That %if is there because I was  
> running into errors when rpm would complain that some files were  
> listed twice (e.g., under %{prefix} and %{sysconfdir}).
> 
> I must not be understanding exactly what you're running if I can't  
> reproduce the problem.  Can you specify your exact rpmbuild command?
> 

Okay, I'm no longer sure to which spec file you're referring.  For
clarity, I'm now using the spec file you pointed me to in your original
reply, from revision 17372.  With this file I no longer get any errors
when I run:

rpmbuild -bb --define 'build_all_in_one_rpm 0' --define
'configure_options \
--with-mip-f90-size=medium --with-tm=/usr/local/lib64' openmpi.spec

This is great for me since this is how I want to build my rpms.  However
if I use the following command line with the new spec file I get the
above installed (but unpackaged) errors which is fine for me but bad for
anyone who wants to install in /opt.  

rpmbuild -bb  --define 'install_in_opt 1' \  
--define 'build_all_in_one_rpm 0' --define 'configure_options \
--with-mip-f90-size=medium --with-tm=/usr/local/lib64' openmpi.spec

Now, if you removed line 651 and 653 from the new spec file it works for
both cases.  You wont get the files listed twice error because although
you have the statement  %dir %{_prefix} on line 649 you never have a
line with just %{_prefix}.  So the _prefix directory itself gets
included but not all files underneath it.  You've handled that by
explicitly including all files and sub directories on lines 672-681 and
in the runtime.file.  

Going back to the original spec file, the one that came with the source
RPM, the problems where kind of reversed.  Building with the
'install_in_opt 1' option worked just fine but when it wasn't set you
got the files listed twice error as I described in my original message.

-Christopher



Re: [OMPI users] RPM build errors when creating multiple rpms

2008-03-18 Thread Michael Jennings
On Tuesday, 18 March 2008, at 12:15:34 (-0700),
Christopher Irving wrote:

> Now, if you removed line 651 and 653 from the new spec file it works
> for both cases.  You wont get the files listed twice error because
> although you have the statement %dir %{_prefix} on line 649 you
> never have a line with just %{_prefix}.  So the _prefix directory
> itself gets included but not all files underneath it.  You've
> handled that by explicitly including all files and sub directories
> on lines 672-681 and in the runtime.file.

The only package which should own %{_prefix} is something like setup
or filesystem in the core OS package set.  No openmpi RPM should ever
own %{_prefix}, so it should never appear in %files, either by itself
or with %dir.

> Going back to the original spec file, the one that came with the
> source RPM, the problems where kind of reversed.  Building with the
> 'install_in_opt 1' option worked just fine but when it wasn't set
> you got the files listed twice error as I described in my original
> message.

"files listed twice" messages are not errors, per se, and can usually
be safely ignored.  Those who are truly bothered by them can always
add %exclude directives if they so choose.

Michael

-- 
Michael Jennings 
Linux Systems and Cluster Admin
UNIX and Cluster Computing Group


Re: [OMPI users] equivalent to mpichgm --gm-recv blocking?

2008-03-18 Thread Patrick Geoffray

Hi Greg,

Siekas, Greg wrote:

Is it possible to get the same blocking behavior with openmpi?  I'm
having a difficult time getting this to work properly.  The application
is spinning on sched_yield which takes up a cpu core.


Per its design, OpenMPI cannot block. sched_yield is all it can do to 
improve fairness.


Patrick


[OMPI users] parallel molecular Dynamic simulations: All to All Comunication

2008-03-18 Thread


 Dear All,

 I was parallelising the serial molecular dynamic simulation code as 
given below: 
 I have only two processors. My system is a duacore system.


c--

   SERIAL CODE
c...


   DO m=1,nmol!! nmol is total number of molecules

  DO i=1,2

ax(i,m)=0.0d0   !acceleration

ay(i,m)=0.0d0   !acceleration

az(i,m)=0.0d0   !acceleration

  ENDDO

  DO j=1,nmol

Ngmol(j,m)=0

  ENDDO

   ENDDO

c-
c force calculations
c-
   DO m=1,nmol-1

  DO i=1,2



natom = natom +1

ibeg  = inbl(natom)

iend  = inbl(natom+1)-1

DO ilist=ibeg,iend !! no. of neighbors

  j=inblst1(ilist) !! neighbor molecular label

  k=inblst2(ilist) !! neighbor atomic label
c j,k are molecular and atomic label of neighbour list of each 
molecule 
c on each processor.


C 

C

C Interatomic distance

C 



  xij = x1(i,m) - x1(k,j)

  yij = y1(i,m) - y1(k,j)

  zij = z1(i,m) - z1(k,j)

C __

C

C Apply periodic boundary conditions

C __



  dpbcx = - boxx*dnint(xij/boxx)

  dpbcy = - boxy*dnint(yij/boxy)

  dpbcz = - boxz*dnint(zij/boxz)

  xij  = xij + dpbcx

  yij  = yij + dpbcy

  zij  = zij + dpbcz 

  rij2 = xij*xij + yij*yij + zij*zij

C 

C

C Calculate forces

C 



  IF (rij2.le.rcutsq) then

rij  = dsqrt(rij2)

r_2  = sig1sq/rij2

r_6  = r_2*r_2*r_2

r_12 = r_6*r_6

pot_lj   = pot_lj+((r_12-r_6) + rij*vfc-vc)  !! need 4*eps1

fij  = 24.0d0*eps1*((2*r_12-r_6)/rij2 - fc/rij)

fxlj = fij*xij

fylj = fij*yij

fzlj = fij*zij


ax(i,m)  = ax(i,m) + fxlj

ay(i,m)  = ay(i,m) + fylj

az(i,m)  = az(i,m) + fzlj
ax(k,j)  = ax(k,j) - fxlj

ay(k,j)  = ay(k,j) - fylj

az(k,j)  = az(k,j) - fzlj

pconf= pconf+(xij*fxlj + yij*fylj + zij*fzlj)



IF (ngmol(j,m).eq.0) then

  xmolij   = xmol(m) - xmol(j) + dpbcx

  ymolij   = ymol(m) - ymol(j) + dpbcy

  zmolij   = zmol(m) - zmol(j) + dpbcz

  rmolij   = dsqrt(xmolij*xmolij+ymolij*ymolij

 &+zmolij*zmolij)

  nr   = dnint(rmolij/dgr)

  ng12(nr) = ng12(nr)+2
  ngmol(j,m) = 1

ENDIF

  ENDIF

ENDDO

  ENDDO

   ENDDO
   DO m=1,nmol

DO i=1,2

write(*,100)ax(i,m),ay(i,m),az(i,m)

ENDDO

   ENDDO   



and below is the parallelised part

c
c   PARALLEL CODE:
c--
   DO m=1,nmol

  DO i=1,2

ax(i,m)=0.0d0

ay(i,m)=0.0d0

az(i,m)=0.0d0

  ENDDO

  DO j=1,nmol

ngmol(j,m)=0

  ENDDO

ENDDO
   CALL para_range(1, nmol, nprocs, myrank, nmolsta, nmolend)
   DO m=nmolsta,nmolend-1  !!nmol is diveded into two parts 
   !!and nmolsta and nmolend are starting and ending
   !!index for each processor  

  DO i=1,2



ibeg  = inbl(natom)

iend  = inbl(natom+1)-1


 DO ilist=ibeg,iend !! no. of neighbors

   j=inblst1(ilist) !! neighbor molecular label

   k=inblst2(ilist) !! neighbor atomic label

c 

C

C Interatomic distance

C 



  xij = x1(i,m) - x1(k,j)

  yij = y1(i,m) - y1(k,j)

  zij = z1(i,m) - z1(k,j)



  dpbcx = - boxx*dnint(xij/boxx)

  dpbcy = - boxy*dnint(yij/boxy)

  dpbcz = - boxz*dnint(zij/boxz)


  xij  = xij + dpbcx

  yij  = yij + dpbcy

  zij  = zij + dpbcz 

  rij2 = xij*xij + yij*yij + zij*zij

  IF (rij2.le.rcutsq) then

 

Re: [OMPI users] RPM build errors when creating multiple rpms

2008-03-18 Thread Christopher Irving
On Tue, 2008-03-18 at 12:28 -0700, Michael Jennings wrote: 
> On Tuesday, 18 March 2008, at 12:15:34 (-0700),
> Christopher Irving wrote:
> 
> > Now, if you removed line 651 and 653 from the new spec file it works
> > for both cases.  You wont get the files listed twice error because
> > although you have the statement %dir %{_prefix} on line 649 you
> > never have a line with just %{_prefix}.  So the _prefix directory
> > itself gets included but not all files underneath it.  You've
> > handled that by explicitly including all files and sub directories
> > on lines 672-681 and in the runtime.file.
> 
> The only package which should own %{_prefix} is something like setup
> or filesystem in the core OS package set.  No openmpi RPM should ever
> own %{_prefix}, so it should never appear in %files, either by itself
> or with %dir.
> 


Well you're half correct.  You're thinking that _prefix is always
defined as /usr.  But in the case were install_in_opt is defined they
have redefined _prefix to be 
/opt/%{name}/%{version} in which case it is fine for one of the openmpi
rpms to claim that directory with a %dir directive. However I think you
missed the point.  I'm not suggesting they need to a %{_prefix}
statement in the %files section, I'm just pointing out what's not the
source of the duplicated files. In other words %dir %{_prefix} is not
the same as %{_prefix} and wont cause all the files in _prefix to be
included.

> > Going back to the original spec file, the one that came with the
> > source RPM, the problems where kind of reversed.  Building with the
> > 'install_in_opt 1' option worked just fine but when it wasn't set
> > you got the files listed twice error as I described in my original
> > message.
> 
> "files listed twice" messages are not errors, per se, and can usually
> be safely ignored.  Those who are truly bothered by them can always
> add %exclude directives if they so choose.
> 
> Michael

It can't be safely ignored when it causes rpm build to fail.  Also you
don't want to use an %exclude because that would prevent the specified
files from ever getting included which is not the desired result.  It's
much easier and makes a lot more sense to remove the source of the
duplicated inclusion.  Which is exactly what they did and why that's no
longer the problem with the new spec file.

-C