Re: [OMPI users] Passwordless ssh

2012-01-12 Thread Shaandar Nyamtulga

Dear Reuti
 
Then what I should do? I am novice in ssh, OpenMPI. Can you direct me little 
bit further? I am quite confused.
Thank you
 

> From: re...@staff.uni-marburg.de
> Date: Wed, 11 Jan 2012 12:31:07 +0100
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Passwordless ssh
> 
> Hi,
> 
> Am 11.01.2012 um 05:46 schrieb Ralph Castain:
> 
> > You might want to ask that on the Beowulf mailing lists - I suspect it has 
> > something to do with the mount procedure, but honestly have no real idea 
> > how to resolve it.
> > 
> > On Jan 10, 2012, at 8:45 PM, Shaandar Nyamtulga wrote:
> > 
> >> Hi
> >> I built Beuwolf cluster using OpenMPI reading the following link.
> >> http://techtinkering.com/2009/12/02/setting-up-a-beowulf-cluster-using-open-mpi-on-linux/
> >> I can do ssh to my slave nodes without the slave mpiuser's password before 
> >> mounting my slaves.
> >> But when I mount my slaves and do ssh, the slaves ask again their 
> >> passwords.
> >> Master and slaves' ssh directory and authorized_keys have permission 700, 
> >> 600 respectively and
> >> they owned only by owner mpiuser through chown.RSA has no passphrase.
> 
> it sounds like the ~/.ssh/authorized_keys on the master isn't containing its 
> own public key (as in a plain sever you don't need it). Hence if you mount it 
> on the slaves, it's missing again.
> 
> -- Reuti
> 
> 
> >> Please help me on this matter.
> >> 
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
  

Re: [OMPI users] Passwordless ssh

2012-01-12 Thread Reuti
Am 12.01.2012 um 12:17 schrieb Shaandar Nyamtulga:

> Dear Reuti
>  
> Then what I should do? I am novice in ssh, OpenMPI. Can you direct me little 
> bit further? I am quite confused.
> Thank you

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

on the file server.

-- Reuti

>  
> > From: re...@staff.uni-marburg.de
> > Date: Wed, 11 Jan 2012 12:31:07 +0100
> > To: us...@open-mpi.org
> > Subject: Re: [OMPI users] Passwordless ssh
> > 
> > Hi,
> > 
> > Am 11.01.2012 um 05:46 schrieb Ralph Castain:
> > 
> > > You might want to ask that on the Beowulf mailing lists - I suspect it 
> > > has something to do with the mount procedure, but honestly have no real 
> > > idea how to resolve it.
> > > 
> > > On Jan 10, 2012, at 8:45 PM, Shaandar Nyamtulga wrote:
> > > 
> > >> Hi
> > >> I built Beuwolf cluster using OpenMPI reading the following link.
> > >> http://techtinkering.com/2009/12/02/setting-up-a-beowulf-cluster-using-open-mpi-on-linux/
> > >> I can do ssh to my slave nodes without the slave mpiuser's password 
> > >> before mounting my slaves.
> > >> But when I mount my slaves and do ssh, the slaves ask again their 
> > >> passwords.
> > >> Master and slaves' ssh directory and authorized_keys have permission 
> > >> 700, 600 respectively and
> > >> they owned only by owner mpiuser through chown.RSA has no passphrase.
> > 
> > it sounds like the ~/.ssh/authorized_keys on the master isn't containing 
> > its own public key (as in a plain sever you don't need it). Hence if you 
> > mount it on the slaves, it's missing again.
> > 
> > -- Reuti
> > 
> > 
> > >> Please help me on this matter.
> > >> 
> > >> ___
> > >> users mailing list
> > >> us...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > 
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] checkpointing on other transports

2012-01-12 Thread Dave Love
What would be involved in adding checkpointing to other transports,
specifically the PSM MTL?  Are there (likely to be?) technical
obstacles, and would it be a lot of work if not?  I'm asking in case it
would be easy, and we don't have to exclude QLogic from a procurement,
given they won't respond about open-mpi support.



Re: [OMPI users] ompi + bash + GE + modules

2012-01-12 Thread Dave Love
Surely this should be on the gridengine list -- and it's in recent
archives -- but there's some ob-openmpi below.  Can Notre Dame not get
the support they've paid Univa for?

Reuti  writes:

> SGE 6.2u5 can't handle multi line environment variables or functions,
> it was fixed in 6.2u6 which isn't free.

[It's not listed for 6.2u6.]  For what it's worth, my fix for Sun's fix
is https://arc.liv.ac.uk/trac/SGE/changeset/3556/sge.

> Do you use -V while submitting the job? Just ignore the error or look
> into Son of Gridengine which fixed it too.

Of course
you can always avoid the issue by not using `export -f', which isn't in
the modules version we have.  I default -V in sge_request and load
the open-mpi module in the job submission session.  I don't
fin whatever problems it causes, and it works for binaries like
  qsub -b y ... mpirun ...
However, the folkloristic examples here typically load the module stuff
in the job script.

> If you can avoid -V, then it could be defined in any of the .profile
> or alike if you use -l as suggested.  You could even define a
> started_method in SGE to define it for all users by default and avoid
> to use -V:
>
> #!/bin/sh
> module() { ...command...here... }
> export -f module
> exec "${@}"

That won't work for example if someone is tasteless enough to submit csh.



Re: [OMPI users] ompi + bash + GE + modules

2012-01-12 Thread Mark Suhovecky
Dave-

I'm working with Univa support as well.

I started out debugging this with pretty poor grasp of where in the software 
flow the problem
might be. Like most sysadmins, I belong to many community lists, and find them 
to be of tremendous
help in running problems down. They certainly have been in this case- I've 
posted to the modules-interest
sourcefourge group as well.

I choose to use all the resources open to me, including community user forums 
and paid support, Using
a commercial product's support should not preclude one from using other tools 
as well.

Mark

Mark Suhovecky
HPC System Administrator
Center for Research Computing
University of Notre Dame
suhove...@nd.edu

From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Dave 
Love [d.l...@liverpool.ac.uk]
Sent: Thursday, January 12, 2012 8:40 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] ompi + bash + GE + modules

Surely this should be on the gridengine list -- and it's in recent
archives -- but there's some ob-openmpi below.  Can Notre Dame not get
the support they've paid Univa for?

Reuti  writes:

> SGE 6.2u5 can't handle multi line environment variables or functions,
> it was fixed in 6.2u6 which isn't free.

[It's not listed for 6.2u6.]  For what it's worth, my fix for Sun's fix
is https://arc.liv.ac.uk/trac/SGE/changeset/3556/sge.

> Do you use -V while submitting the job? Just ignore the error or look
> into Son of Gridengine which fixed it too.

Of course
you can always avoid the issue by not using `export -f', which isn't in
the modules version we have.  I default -V in sge_request and load
the open-mpi module in the job submission session.  I don't
fin whatever problems it causes, and it works for binaries like
  qsub -b y ... mpirun ...
However, the folkloristic examples here typically load the module stuff
in the job script.

> If you can avoid -V, then it could be defined in any of the .profile
> or alike if you use -l as suggested.  You could even define a
> started_method in SGE to define it for all users by default and avoid
> to use -V:
>
> #!/bin/sh
> module() { ...command...here... }
> export -f module
> exec "${@}"

That won't work for example if someone is tasteless enough to submit csh.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread Roberto Rey
Hi again,

Today I was trying with another TCP benchmark included in the hpcbench
suite, and with a ping-pong test I'm also getting 100us of latency. Then, I
tried with netperf and the same result

So, in summary, I'm measuring TCP latency with messages size between 1-32
bytes:

Netperf over TCP -> 100us
Netpipe over TCP (NPtcp)-> 100us
HPCbench over TCP-> 100us
Netpipe over OpenMPI (NPmpi) -> 60us
HPCBench over OpenMPI -> 60us

Any clues?

Thanks a lot!

2012/1/10 Roberto Rey 

> Hi,
>
> I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
> hardware and I'm getting strange latency results with Netpipe and OpenMPI.
>
> If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60
> microseconds for small messages (less than 2kbytes). However, when I run
> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
> messages everything seems to be OK.
>
> I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
> outperforms raw TCP performance for small messages (40us of difference). I
> also have run the PingPong test from the Intel Media Benchmarks and the
> latency results for OpenMPI are very similar (60us) to those obtained with
> NPmpi
>
> Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
> optimization in BTL TCP?
>
> The results for OpenMPI aren't so good but we must take into account the
> network virtualization overhead under Xen
>
> Thanks for your reply
>



-- 
Roberto Rey Expósito


Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread Jeff Squyres
Hi Roberto.

We've had strange reports of performance from EC2 before; it's actually been on 
my to-do list to go check this out in detail.  I made contact with the EC2 
folks at Supercomputing late last year.  They've hooked me up with some credits 
on EC2 to go check out what's happening, but the pent-up email deluge from the 
Christmas vacation and my travel to the MPI Forum this week prevented me from 
testing yet.

I hope to be able to get time to test Open MPI on EC2 next week and see what's 
going on.

It's very strange to me that Open MPI is getting *better* than raw TCP 
performance.  I don't have an immediate explanation for that -- if you're using 
the TCP BTL, then OMPI should be using TCP sockets, just like netpipe and the 
others.

You *might* want to check hyperthreading and process binding settings in all 
your tests.


On Jan 12, 2012, at 7:04 AM, Roberto Rey wrote:

> Hi again,
> 
> Today I was trying with another TCP benchmark included in the hpcbench suite, 
> and with a ping-pong test I'm also getting 100us of latency. Then, I tried 
> with netperf and the same result
> 
> So, in summary, I'm measuring TCP latency with messages size between 1-32 
> bytes:
> 
> Netperf over TCP -> 100us
> Netpipe over TCP (NPtcp)-> 100us
> HPCbench over TCP-> 100us
> Netpipe over OpenMPI (NPmpi) -> 60us
> HPCBench over OpenMPI -> 60us
> 
> Any clues?
> 
> Thanks a lot!
> 
> 2012/1/10 Roberto Rey 
> Hi,
> 
> I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet 
> hardware and I'm getting strange latency results with Netpipe and OpenMPI. 
> 
> If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60 
> microseconds for small messages (less than 2kbytes). However, when I run 
> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger 
> messages everything seems to be OK.
> 
> I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI 
> outperforms raw TCP performance for small messages (40us of difference). I 
> also have run the PingPong test from the Intel Media Benchmarks and the 
> latency results for OpenMPI are very similar (60us) to those obtained with 
> NPmpi
> 
> Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any 
> optimization in BTL TCP?
> 
> The results for OpenMPI aren't so good but we must take into account the 
> network virtualization overhead under Xen
> 
> Thanks for your reply
> 
> 
> 
> -- 
> Roberto Rey Expósito
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread Roberto Rey
Thanks for your reply!

I'm using TCP BTL because I don't have any other option in Amazon with 10
Gbit Ethernet.

I also tried with MPICH2 1.4 and I got 60 microseconds...so I am very
confused about it...

Regarding hyperthreading and process binding settings...I am using only one
MPI process in each node (2 nodes for a clasical ping-pong latency
benchmark). I don't know how it could affect on this test...but I could try
anything that anyone suggest to me

2012/1/12 Jeff Squyres 

> Hi Roberto.
>
> We've had strange reports of performance from EC2 before; it's actually
> been on my to-do list to go check this out in detail.  I made contact with
> the EC2 folks at Supercomputing late last year.  They've hooked me up with
> some credits on EC2 to go check out what's happening, but the pent-up email
> deluge from the Christmas vacation and my travel to the MPI Forum this week
> prevented me from testing yet.
>
> I hope to be able to get time to test Open MPI on EC2 next week and see
> what's going on.
>
> It's very strange to me that Open MPI is getting *better* than raw TCP
> performance.  I don't have an immediate explanation for that -- if you're
> using the TCP BTL, then OMPI should be using TCP sockets, just like netpipe
> and the others.
>
> You *might* want to check hyperthreading and process binding settings in
> all your tests.
>
>
> On Jan 12, 2012, at 7:04 AM, Roberto Rey wrote:
>
> > Hi again,
> >
> > Today I was trying with another TCP benchmark included in the hpcbench
> suite, and with a ping-pong test I'm also getting 100us of latency. Then, I
> tried with netperf and the same result
> >
> > So, in summary, I'm measuring TCP latency with messages size between
> 1-32 bytes:
> >
> > Netperf over TCP -> 100us
> > Netpipe over TCP (NPtcp)-> 100us
> > HPCbench over TCP-> 100us
> > Netpipe over OpenMPI (NPmpi) -> 60us
> > HPCBench over OpenMPI -> 60us
> >
> > Any clues?
> >
> > Thanks a lot!
> >
> > 2012/1/10 Roberto Rey 
> > Hi,
> >
> > I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
> hardware and I'm getting strange latency results with Netpipe and OpenMPI.
> >
> > If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60
> microseconds for small messages (less than 2kbytes). However, when I run
> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
> messages everything seems to be OK.
> >
> > I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
> outperforms raw TCP performance for small messages (40us of difference). I
> also have run the PingPong test from the Intel Media Benchmarks and the
> latency results for OpenMPI are very similar (60us) to those obtained with
> NPmpi
> >
> > Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
> optimization in BTL TCP?
> >
> > The results for OpenMPI aren't so good but we must take into account the
> network virtualization overhead under Xen
> >
> > Thanks for your reply
> >
> >
> >
> > --
> > Roberto Rey Expósito
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Roberto Rey Expósito


Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread teng ma
Is it possible your EC2 cluster has another "unknown" crappy Ethernet
card(e.g. 1Gb
Ethernet card) . For small messages, they go through different paths in
NPtcp or MPI over NPmpi.

Teng Ma

On Thu, Jan 12, 2012 at 10:28 AM, Roberto Rey  wrote:

> Thanks for your reply!
>
> I'm using TCP BTL because I don't have any other option in Amazon with 10
> Gbit Ethernet.
>
> I also tried with MPICH2 1.4 and I got 60 microseconds...so I am very
> confused about it...
>
> Regarding hyperthreading and process binding settings...I am using only
> one MPI process in each node (2 nodes for a clasical ping-pong latency
> benchmark). I don't know how it could affect on this test...but I could try
> anything that anyone suggest to me
>
> 2012/1/12 Jeff Squyres 
>
>> Hi Roberto.
>>
>> We've had strange reports of performance from EC2 before; it's actually
>> been on my to-do list to go check this out in detail.  I made contact with
>> the EC2 folks at Supercomputing late last year.  They've hooked me up with
>> some credits on EC2 to go check out what's happening, but the pent-up email
>> deluge from the Christmas vacation and my travel to the MPI Forum this week
>> prevented me from testing yet.
>>
>> I hope to be able to get time to test Open MPI on EC2 next week and see
>> what's going on.
>>
>> It's very strange to me that Open MPI is getting *better* than raw TCP
>> performance.  I don't have an immediate explanation for that -- if you're
>> using the TCP BTL, then OMPI should be using TCP sockets, just like netpipe
>> and the others.
>>
>> You *might* want to check hyperthreading and process binding settings in
>> all your tests.
>>
>>
>> On Jan 12, 2012, at 7:04 AM, Roberto Rey wrote:
>>
>> > Hi again,
>> >
>> > Today I was trying with another TCP benchmark included in the hpcbench
>> suite, and with a ping-pong test I'm also getting 100us of latency. Then, I
>> tried with netperf and the same result
>> >
>> > So, in summary, I'm measuring TCP latency with messages size between
>> 1-32 bytes:
>> >
>> > Netperf over TCP -> 100us
>> > Netpipe over TCP (NPtcp)-> 100us
>> > HPCbench over TCP-> 100us
>> > Netpipe over OpenMPI (NPmpi) -> 60us
>> > HPCBench over OpenMPI -> 60us
>> >
>> > Any clues?
>> >
>> > Thanks a lot!
>> >
>> > 2012/1/10 Roberto Rey 
>> > Hi,
>> >
>> > I'm running some tests on EC2 cluster instances with 10 Gigabit
>> Ethernet hardware and I'm getting strange latency results with Netpipe and
>> OpenMPI.
>> >
>> > If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60
>> microseconds for small messages (less than 2kbytes). However, when I run
>> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
>> messages everything seems to be OK.
>> >
>> > I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
>> outperforms raw TCP performance for small messages (40us of difference). I
>> also have run the PingPong test from the Intel Media Benchmarks and the
>> latency results for OpenMPI are very similar (60us) to those obtained with
>> NPmpi
>> >
>> > Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
>> optimization in BTL TCP?
>> >
>> > The results for OpenMPI aren't so good but we must take into account
>> the network virtualization overhead under Xen
>> >
>> > Thanks for your reply
>> >
>> >
>> >
>> > --
>> > Roberto Rey Expósito
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Roberto Rey Expósito
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
| Teng Ma  Univ. of Tennessee |
| t...@cs.utk.eduKnoxville, TN |
| http://web.eecs.utk.edu/~tma/   |


Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-12 Thread Roberto Rey
With ifconfig I can only see one Ethernet card (eth0) as well as the
loopback interface

2012/1/12 teng ma 

> Is it possible your EC2 cluster has another "unknown" crappy Ethernet
> card(e.g. 1Gb
> Ethernet card) . For small messages, they go through different paths in
> NPtcp or MPI over NPmpi.
>
> Teng Ma
>
>
> On Thu, Jan 12, 2012 at 10:28 AM, Roberto Rey  wrote:
>
>> Thanks for your reply!
>>
>> I'm using TCP BTL because I don't have any other option in Amazon with 10
>> Gbit Ethernet.
>>
>> I also tried with MPICH2 1.4 and I got 60 microseconds...so I am very
>> confused about it...
>>
>> Regarding hyperthreading and process binding settings...I am using only
>> one MPI process in each node (2 nodes for a clasical ping-pong latency
>> benchmark). I don't know how it could affect on this test...but I could try
>> anything that anyone suggest to me
>>
>> 2012/1/12 Jeff Squyres 
>>
>>> Hi Roberto.
>>>
>>> We've had strange reports of performance from EC2 before; it's actually
>>> been on my to-do list to go check this out in detail.  I made contact with
>>> the EC2 folks at Supercomputing late last year.  They've hooked me up with
>>> some credits on EC2 to go check out what's happening, but the pent-up email
>>> deluge from the Christmas vacation and my travel to the MPI Forum this week
>>> prevented me from testing yet.
>>>
>>> I hope to be able to get time to test Open MPI on EC2 next week and see
>>> what's going on.
>>>
>>> It's very strange to me that Open MPI is getting *better* than raw TCP
>>> performance.  I don't have an immediate explanation for that -- if you're
>>> using the TCP BTL, then OMPI should be using TCP sockets, just like netpipe
>>> and the others.
>>>
>>> You *might* want to check hyperthreading and process binding settings in
>>> all your tests.
>>>
>>>
>>> On Jan 12, 2012, at 7:04 AM, Roberto Rey wrote:
>>>
>>> > Hi again,
>>> >
>>> > Today I was trying with another TCP benchmark included in the hpcbench
>>> suite, and with a ping-pong test I'm also getting 100us of latency. Then, I
>>> tried with netperf and the same result
>>> >
>>> > So, in summary, I'm measuring TCP latency with messages size between
>>> 1-32 bytes:
>>> >
>>> > Netperf over TCP -> 100us
>>> > Netpipe over TCP (NPtcp)-> 100us
>>> > HPCbench over TCP-> 100us
>>> > Netpipe over OpenMPI (NPmpi) -> 60us
>>> > HPCBench over OpenMPI -> 60us
>>> >
>>> > Any clues?
>>> >
>>> > Thanks a lot!
>>> >
>>> > 2012/1/10 Roberto Rey 
>>> > Hi,
>>> >
>>> > I'm running some tests on EC2 cluster instances with 10 Gigabit
>>> Ethernet hardware and I'm getting strange latency results with Netpipe and
>>> OpenMPI.
>>> >
>>> > If I run Netpipe over OpenMPI (NPmpi) I get a network latency around
>>> 60 microseconds for small messages (less than 2kbytes). However, when I run
>>> Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
>>> messages everything seems to be OK.
>>> >
>>> > I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
>>> outperforms raw TCP performance for small messages (40us of difference). I
>>> also have run the PingPong test from the Intel Media Benchmarks and the
>>> latency results for OpenMPI are very similar (60us) to those obtained with
>>> NPmpi
>>> >
>>> > Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
>>> optimization in BTL TCP?
>>> >
>>> > The results for OpenMPI aren't so good but we must take into account
>>> the network virtualization overhead under Xen
>>> >
>>> > Thanks for your reply
>>> >
>>> >
>>> >
>>> > --
>>> > Roberto Rey Expósito
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Roberto Rey Expósito
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> | Teng Ma  Univ. of Tennessee |
> | t...@cs.utk.eduKnoxville, TN |
> | http://web.eecs.utk.edu/~tma/   |
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Roberto Rey Expósito


[OMPI users] SIGSEGV on MPI_Test

2012-01-12 Thread devendra rai
Hello Community:

I am running into a strange problem. I get a SIGSEGV when I try to execute 
MPI_Test:

==21076== Process terminating with default action of signal 11 (SIGSEGV)
==21076==  Bad permissions for mapped region at address 0x43AEE1
==21076==    at 0x509B957: ompi_request_default_test (req_test.c:68)
==21076==    by 0x50EDEBB: PMPI_Test (ptest.c:59)
==21076==    by 0x44210D: InterProcessorTransmit::StartTransmission() 
(InterProcessorTransmit.cpp:111)


Here is the relevant piece of code:

   for ( this->dbIterator = localdb.begin( ) ; this->dbIterator != 
localdb.end( ); this->dbIterator++)
    {
    this->TransmissionDetails = (this->dbIterator)->second;
    SendComplete = 0;
    UniqueIDtoSendto = std::get<0>(this->TransmissionDetails);
    RecepientNode = (this->dbIterator)->first;
  Isend_request = MPI::COMM_WORLD.Issend(this->transmitbuffer, 
this->transmissionsize, MPI_BYTE, (this->dbIterator)->first, 
std::get<0>(this->TransmissionDetails));
/*This is line 111 */MPI_Test(&(this->Isend_request), &(this->SendComplete), 
&(this->ISend_status));
  while(!this->SendComplete)
    {
    /* Test whether the transmission was okay*/
    MPI_Test(&(this->Isend_request), &(this->SendComplete), 
&(this->ISend_status));


    / see if we need to pause or stop /
    {
   /* The mutex is released after exiting this 
block */
   std::unique_lock 
pr_dblock(this->mutexforPauseResume);

   while(this->pause==1)
 {
   /* pause till resume signal is received */
   this->WaitingforResume.wait(pr_dblock);
 }
   if(this->stop == 1)
 {
   /* stop this transmission */
   return(0);
 }

   /* mutex is released here */
    }
    / End of pause/ stop check /


Am I missing something here? The piece of code shown here runs in a thread. 


Thanks a lot for any pointers.

Best

Devendra 


Re: [OMPI users] SIGSEGV on MPI_Test

2012-01-12 Thread devendra rai
Hello All,

Continuing my previous mail, I thought attaching this debugger screenshot may 
help anyone come up with an explanation. The exact location where the segfault 
happens is also highlighted.

Thanks a lot for any help.

Best,

Devendra




 From: devendra rai 
To: Open MPI Users  
Sent: Thursday, 12 January 2012, 17:05
Subject: [OMPI users] SIGSEGV on MPI_Test
 

Hello Community:

I am running into a strange problem. I get a SIGSEGV when I try to execute 
MPI_Test:

==21076== Process terminating with default action of signal 11 (SIGSEGV)
==21076==  Bad permissions for mapped region at address 0x43AEE1
==21076==    at 0x509B957: ompi_request_default_test (req_test.c:68)
==21076==    by 0x50EDEBB: PMPI_Test (ptest.c:59)
==21076==    by 0x44210D: InterProcessorTransmit::StartTransmission() 
(InterProcessorTransmit.cpp:111)


Here is the relevant piece of code:

   for ( this->dbIterator = localdb.begin( ) ; this->dbIterator != 
localdb.end( ); this->dbIterator++)
    {
    this->TransmissionDetails = (this->dbIterator)->second;
    SendComplete = 0;
    UniqueIDtoSendto = std::get<0>(this->TransmissionDetails);
    RecepientNode = (this->dbIterator)->first;
  Isend_request = MPI::COMM_WORLD.Issend(this->transmitbuffer, 
this->transmissionsize, MPI_BYTE, (this->dbIterator)->first, 
std::get<0>(this->TransmissionDetails));
/*This is line 111 */MPI_Test(&(this->Isend_request), &(this->SendComplete),
 &(this->ISend_status));
  while(!this->SendComplete)
    {
    /* Test whether the transmission was okay*/
    MPI_Test(&(this->Isend_request), &(this->SendComplete), 
&(this->ISend_status));


    / see if we need to pause or stop /
   
 {
   /* The mutex is released after exiting this 
block */
   std::unique_lock 
pr_dblock(this->mutexforPauseResume);

   while(this->pause==1)

 {
   /* pause till resume signal is received */
   this->WaitingforResume.wait(pr_dblock);
 }
   if(this->stop ==
 1)
 {
   /* stop this transmission */
   return(0);

 }

   /* mutex is released here */
    }
    / End of pause/ stop check /


Am I missing something here? The piece of code shown here runs in a thread. 


Thanks a lot for any pointers.

Best

Devendra 

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] IB Memory Requirements, adjusting for reduced memory consumption

2012-01-12 Thread V. Ram
Open MPI IB Gurus,

I have some slightly older InfiniBand-equipped nodes with IB which have
less RAM than we'd like, and on which we tend to run jobs that can span
16-32 nodes of this type.  The jobs themselves tend to run on the heavy
side in terms of their own memory requirements.

When we used to run on an older Intel MPI, these jobs managed to run
within the available RAM without paging out to disk.  Now using Open MPI
1.5.3, we can end up paging to disk or even running out of memory for
the same codes and exact same jobs and node distributions.

I'm suspecting that I can reduce overall memory consumption by tuning
the IB-related memory that Open MPI consumes.  I've looked at the FAQ:
http://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage
, but I'm still not certain about where I should start.  Again, this is
all for 1.5.3 (we are willing to update to 1.5.4 or 1.5.5 when released,
if it would help).

1. It looks like there are several independent IB BTL MCA parameters to
try adjusting: i. mpool_rdma_rcache_size_limit, ii.
btl_openib_free_list_max , iii. btl_openib_max_send_size , iv.
btl_openib_eager_rdma_num, v. btl_openib_max_eager_rdma, vi.
btl_openib_eager_limit .  Have I missed any others parameters that
impact InfiniBand-related memory usage?  These parameters are listed as
affecting registered memory.  Are there parameters that affect
unregistered IB-related memory consumption on the part of Open MPI
itself?

2. Where should I start with this?  For example, is it worth trying to
adjust any of the eager parameters, or are the bulk of the memory
requirements coming from the mpool_rdma_rcache_size_limit?

3. Are there any gross/overall "master" parameters that will set limits,
but keep the various buffers in intelligent proportion to one another,
or will I need to manually adjust each set of buffers independently?  If
the latter, are there any guidelines on the relative proportions between
buffers, or overall recommendations?

Thank you very much.

-- 
http://www.fastmail.fm - A fast, anti-spam email service.



Re: [OMPI users] IB Memory Requirements, adjusting for reduced memory consumption

2012-01-12 Thread Nathan Hjelm

I would start by adjusting btl_openib_receive_queues . The default uses a 
per-peer QP which can eat up a lot of memory. I recommend using no per-peer and 
several shared receive queues. We use S,4096,1024:S,12288,512:S,65536,512

-Nathan

On Thu, 12 Jan 2012, V. Ram wrote:


Open MPI IB Gurus,

I have some slightly older InfiniBand-equipped nodes with IB which have
less RAM than we'd like, and on which we tend to run jobs that can span
16-32 nodes of this type.  The jobs themselves tend to run on the heavy
side in terms of their own memory requirements.

When we used to run on an older Intel MPI, these jobs managed to run
within the available RAM without paging out to disk.  Now using Open MPI
1.5.3, we can end up paging to disk or even running out of memory for
the same codes and exact same jobs and node distributions.

I'm suspecting that I can reduce overall memory consumption by tuning
the IB-related memory that Open MPI consumes.  I've looked at the FAQ:
http://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage
, but I'm still not certain about where I should start.  Again, this is
all for 1.5.3 (we are willing to update to 1.5.4 or 1.5.5 when released,
if it would help).

1. It looks like there are several independent IB BTL MCA parameters to
try adjusting: i. mpool_rdma_rcache_size_limit, ii.
btl_openib_free_list_max , iii. btl_openib_max_send_size , iv.
btl_openib_eager_rdma_num, v. btl_openib_max_eager_rdma, vi.
btl_openib_eager_limit .  Have I missed any others parameters that
impact InfiniBand-related memory usage?  These parameters are listed as
affecting registered memory.  Are there parameters that affect
unregistered IB-related memory consumption on the part of Open MPI
itself?

2. Where should I start with this?  For example, is it worth trying to
adjust any of the eager parameters, or are the bulk of the memory
requirements coming from the mpool_rdma_rcache_size_limit?

3. Are there any gross/overall "master" parameters that will set limits,
but keep the various buffers in intelligent proportion to one another,
or will I need to manually adjust each set of buffers independently?  If
the latter, are there any guidelines on the relative proportions between
buffers, or overall recommendations?

Thank you very much.

--
http://www.fastmail.fm - A fast, anti-spam email service.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users