from:"Rayson Ho"

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Rayson Ho

Srinivas,

There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
if you can checkpoint an MPI task and restart it on a new node, then
this is also "process migration".

Of course, doing a checkpoint & restart can be slower than pure
in-kernel process migration, but the advantage is that you don't need
any kernel support, and can in fact do all of it in user-space.

Rayson


On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
> It also depends on what part of migration interests you - are you wanting to 
> look at the MPI part of the problem (reconnecting MPI transports, ensuring 
> messages are not lost, etc.) or the RTE part of the problem (where to restart 
> processes, detecting failures, etc.)?
>
>
> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:
>
>> Be aware that process migration is a pretty complex issue.
>>
>> Josh is probably the best one to answer your question directly, but he's out 
>> today.
>>
>>
>> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
>>
>>> I am final year grad student looking for my final year project in 
>>> OpenMPI.We are group of 4 students.
>>> I wanted to know about the "Process Migration" process of MPI processes in 
>>> OpenMPI.
>>> Can anyone suggest me any ideas for project related to process migration in 
>>> OenMPI or other topics in Systems.
>>>
>>>
>>>
>>> regards,
>>> Srinivas Kundaram
>>> srinu1...@gmail.com
>>> +91-8149399160
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Rayson Ho

Don't know which SSI project you are referring to... I only know the
OpenSSI project, and I was one of the first who subscribed to its
mailing list (since 2001).

http://openssi.org/cgi-bin/view?page=openssi.html

I don't think those OpenSSI clusters are designed for tens of
thousands of nodes, and not sure if it scales well to even a thousand
nodes -- so IMO they have limited use for HPC clusters.

Rayson



On Thu, Aug 25, 2011 at 11:45 AM, Durga Choudhury  wrote:
> Also, in 2005 there was an attempt to implement SSI (Single System
> Image) functionality to the then-current 2.6.10 kernel. The proposal
> was very detailed and covered most of the bases of task creation, PID
> allocation etc across a loosely tied cluster (without using fancy
> hardware such as RDMA fabric). Anybody knows if it was ever
> implemented? Any pointers in this direction?
>
> Thanks and regards
> Durga
>
>
> On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho  wrote:
>> Srinivas,
>>
>> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
>> if you can checkpoint an MPI task and restart it on a new node, then
>> this is also "process migration".
>>
>> Of course, doing a checkpoint & restart can be slower than pure
>> in-kernel process migration, but the advantage is that you don't need
>> any kernel support, and can in fact do all of it in user-space.
>>
>> Rayson
>>
>>
>> On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
>>> It also depends on what part of migration interests you - are you wanting 
>>> to look at the MPI part of the problem (reconnecting MPI transports, 
>>> ensuring messages are not lost, etc.) or the RTE part of the problem (where 
>>> to restart processes, detecting failures, etc.)?
>>>
>>>
>>> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:
>>>
>>>> Be aware that process migration is a pretty complex issue.
>>>>
>>>> Josh is probably the best one to answer your question directly, but he's 
>>>> out today.
>>>>
>>>>
>>>> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
>>>>
>>>>> I am final year grad student looking for my final year project in 
>>>>> OpenMPI.We are group of 4 students.
>>>>> I wanted to know about the "Process Migration" process of MPI processes 
>>>>> in OpenMPI.
>>>>> Can anyone suggest me any ideas for project related to process migration 
>>>>> in OenMPI or other topics in Systems.
>>>>>
>>>>>
>>>>>
>>>>> regards,
>>>>> Srinivas Kundaram
>>>>> srinu1...@gmail.com
>>>>> +91-8149399160
>>>>> ___
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Rayson
>>
>> ==
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] How to add nodes while running job

2011-08-27 Thread Rayson Ho

On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain  wrote:
> OMPI has no way of knowing that you will turn the node on at some future
> point. All it can do is try to launch the job on the provided node, which
> fails because the node doesn't respond.
> You'll have to come up with some scheme for telling the node to turn on in
> anticipation of starting a job - a resource manager is typically used for
> that purpose.

Hi Ralph,

Are you referring to a specific resource manager/batch system?? AFAIK,
no common batch systems support MPI_Spawn properly...

Rayson




> On Aug 27, 2011, at 6:58 AM, Rafael Braga wrote:
>
> I would like to know how to add nodes during a job execution.
> Now my hostfile has the node 10.0.0.23 that is off,
> I would start this node during the execution so that the job can use it
> When I run the command:
>
> mpirun -np 2 -hostfile /tmp/hosts application
>
> the following message appears:
>
> ssh: connect to host 10.0.0.23 port 22: No route to host
> --
> A daemon (pid 10773) died unexpectedly with status 255 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> mpirun: clean termination accomplished
>
> thanks a lot,
>
> --
> Rafael Braga
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] OpenMPI Nonblocking Send/Recv

2011-09-13 Thread Rayson Ho

Hi Xin,

Since it is not Open MPI specific, you might want to try to work with
the SciNet guys first. The "SciNet Research Computing Consulting
Clinic" is specifically formed to help U of T students & researchers
develop and design compute-intensive programs.

http://www.scinet.utoronto.ca/
http://www.scinet.utoronto.ca/support/Research_Computing_Consulting_Clinic.htm

The service is free, so just send them an email... Of course, they
can't help you with your coursework! :-D

Rayson

=
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


On Tue, Sep 13, 2011 at 12:49 PM, Xin Tong Utoronto  wrote:
> I am new to openmpi. I am not sure whether my logic below will work or not.
> Can someone please confirm for me on that ? Basically, what this does is
> trying to check whether there are anything to send, if there are, send it
> right away and set sentinit to true. Then check whether there are anything
> to receive, if there are receive it. I am running this on a client-server
> model (2 nodes sending and receiving data between each other)
> for (;;)  {
>                if (sendinit && MPI_Test(&sendreq, &sendcomplete,
> &sendstatus)) {
>                       if (sendcomplete) {
>                          if (pollv[1].revents & POLLIN) {
>                              printf("Trying to send in rank %d\n", rank);
>                              nx=vde_recv(conn,sendbuff,BUFSIZE-2,0);
>                              vdestream_mpisend(vdestream,sendbuff, nx,
> GET_PAIR_RANK(rank), &sendreq);
>                          } else {
>                              // no in-flight request.
>                              sendinit = false;
>                          }
>                       }
>                    } else {
>                       // no in-flight request. try to start one
>                       if (!sendinit && pollv[1].revents & POLLIN) {
>                            nx=vde_recv(conn,sendbuff,BUFSIZE-2,0);
>                            printf("Trying to send in rank %d\n", rank );
>                            vdestream_mpisend(vdestream,sendbuff, nx,
> GET_PAIR_RANK(rank), &sendreq);
>                            sendinit = true;
>                        }
>                    }
>
>                    if (recvinit && MPI_Test(&recvreq, &recvcomplete,
> &recvstatus)) {
>                       if (recvcomplete) {
>                           printf("Receive completed\n");
>                           // get the actual number of byet received.
>                           MPI_Get_count(&recvstatus, MPI_CHAR, &recvcount);
>                           vdestream_recv(vdestream, recvbuff, recvcount);
>                           // no more in-flight recv.
>                           recvinit = false;
>                       }
>                    } else {
>                       if (!recvinit) {
>                          printf("Trying to receive in rank %d\n", rank);
>                          // no in-flight recv. try to start one.
>                          vdestream_mpirecv(vdestream, recvbuff, BUFSIZE-2,
> GET_PAIR_RANK(rank), &recvreq);
>                          recvinit = true;
>                       }
>                    }
> }
>
> --
> Kind Regards
>
> Xin Tong
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Problem compiling openmpi-1.4.3

2011-09-13 Thread Rayson Ho

Did you notice the error message:

  /usr/bin/install: cannot remove
`/opt/openmpi/share/openmpi/amca-param-sets/example.conf': Permission
denied

I would check the permission settings of the file first if I encounter
something like this...

Rayson

=
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


On Tue, Sep 13, 2011 at 4:22 PM, amosl...@gmail.com  wrote:
> Dear Users,
>     I have run into a problem trying to compile openmpi-1.4.3.  I am
> running SuSE Linux 11.4 in VMware-7.0.1.  For compilers I am using
> l_fcompxe_intel64_2011.5.220 and l_ccompxe_intel64_2011.5.220 which are
> newly issued. It appears to go through the compile command:
>     ./compile
> LIBDIRS="/opt/intel/composerxe-2011.5.220/compiler/lib/intel64"
> --prefix=/opt/openmpi CC=icc CXX=icpc F77=ifort F90=ifort
> After running "make all install" the end of the output gives the error:
> test -z "/opt/openmpi/share/openmpi/amca-param-sets" || /bin/mkdir -p
> "/opt/openmpi/share/openmpi/amca-param-sets"
>  /usr/bin/install -c -m 644 'amca-param-sets/example.conf'
> '/opt/openmpi/share/openmpi/amca-param-sets/example.conf'
> /usr/bin/install: cannot remove
> `/opt/openmpi/share/openmpi/amca-param-sets/example.conf': Permission denied
> make[2]: *** [install-dist_amca_paramDATA] Error 1
> make[2]: Leaving directory `/home/amos/Downloads/openmpi-1.4.3/contrib'
> make[1]: *** [install-am] Error 2
> make[1]: Leaving directory `/home/amos/Downloads/openmpi-1.4.3/contrib'
> make: *** [install-recursive] Error 1
>     I have tried using examples trying to run one of the examples
> and it gives an error
> /Downloads/openmpi-1.4.3/examples> mpicc -np 4 connectivity_c.c
> mpicc: error while loading shared libraries: libimf.so: cannot open shared
> object file: No such file or directory
> This is the reason for the LIBDIRS in the compiling command.  I have run
> into the same error trying to set up espresso-4.3.1.  The result occurs
> whether I use root or a user login.  The file is present being the next
> entry in the string in LIBDIRS.
>    Any help would be much appreciated.
>
> Amos Leffler
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] Open MPI process cannot do send-receive message correctly on a distributed memory cluster

2011-09-30 Thread Rayson Ho

You can use a debugger (just gdb will do, no TotalView needed) to find
out which MPI send & receive calls are hanging the code on the
distributed cluster, and see if the send & receive pair is due to a
problem described at:

Deadlock avoidance in your MPI programs:
http://www.cs.ucsb.edu/~hnielsen/cs140/mpi-deadlocks.html

Rayson

=
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net

Wikipedia Commons
http://commons.wikimedia.org/wiki/User:Raysonho


On Fri, Sep 30, 2011 at 11:06 AM, Jack Bryan  wrote:
> Hi,
>
> I have a Open MPI program, which works well on a Linux shared memory
> multicore (2 x 6 cores) machine.
>
> But, it does not work well on a distributed cluster with Linux Open MPI.
>
> I found that the the process sends out some messages to other processes,
> which can not receive them.
>
> What is the possible reason ?
>
> I do not change anything of the program.
>
> Any help is really appreciated.
>
> Thanks
>
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and 'get_mempolicy'

2011-12-07 Thread Rayson Ho

We are using hwloc-1.2.2 for topology binding in Open Grid
Scheduler/Grid Engine 2011.11, and a user is encountering similar
issues:

http://gridengine.org/pipermail/users/2011-December/002126.html

In Open MPI, there is the configure switch "--without-libnuma" to turn
libnuma off. But since Open MPI uses hwloc internally, I think there
still is a dependency on libnuma even if "--without-libnuma" is used
to build Open MPI. Also, as hwloc does not have a configure switch
that disables libnuma, seems like libnuma is always used when the
hwloc configure script detects its presence.

So my question is, are there plans to add a configure switch in hwloc
to disable libnuma??

Thanks,
Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/



On Fri, Sep 30, 2011 at 8:03 AM, Jeff Squyres  wrote:
> On Sep 29, 2011, at 12:45 PM, Blosch, Edwin L wrote:
>
>> If I add --without-hwloc in addition to --without-libnuma, then it builds.  
>> Is that a reasonable thing to do?  Is there a better workaround?  This 
>> 'hwloc' module looks like it might be important.
>
> As a note of explanation: hwloc is effectively our replacement for libnuma.  
> You might want to check out hwloc (the standalone software package) -- it has 
> a CLI and is quite useful for administrating servers, even outside of an HPC 
> environment:
>
>    http://www.open-mpi.org/projects/hwloc/
>
> hwloc may use libnuma under the covers; that's where this issue is coming 
> from (i.e., OMPI may still use libnuma -- it's just now doing so indirectly, 
> instead of directly).
>
>> For what it's worth, if there's something wrong with my configure line, let 
>> me know what to improve. Otherwise, as weird as 
>> "--enable-mca-no-build=maffinity --disable-io-romio --enable-static 
>> --disable-shared" may look, I am not trying to build fully static binaries. 
>> I have unavoidable need to build OpenMPI on certain machines and then 
>> transfer the executables to other machines that are compatable but not 
>> identical, and over the years these are the minimal set of configure flags 
>> necessary to make that possible. I may revisit these choices at some point, 
>> but if they are supposed to work, then I'd rather just keep using them.
>
> Your configure line looks fine to me.
>
> FWIW/heads up: in the 1.7 series, we're going to be ignoring the $F77 and 
> $FFLAGS variables; we'll *only* be using $FC and $FCFLAGS.  There's still 
> plenty of time before this hits mainstream, but I figured I'd let you know 
> it's coming.  :-)
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] [Beowulf] How to justify the use MPI codes on multicore systems/PCs?

2011-12-12 Thread Rayson Ho

On Sat, Dec 10, 2011 at 3:21 PM, amjad ali  wrote:
> (2) The latest MPI implementations are intelligent enough that they use some
> efficient mechanism while executing MPI based codes on shared memory
> (multicore) machines.  (please tell me any reference to quote this fact).

Not an academic paper, but from a real MPI library developer/architect:

http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport/
http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport-part-2/

Open MPI is used by Japan's K computer (current #1 TOP 500 computer)
and LANL's RoadRunner (#1 Jun 08 – Nov 09), and "10^16 Flops Can't Be
Wrong" and "10^15 Flops Can't Be Wrong":

http://www.open-mpi.org/papers/sc-2008/jsquyres-cisco-booth-talk-2up.pdf

Rayson

=
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/


>
>
> Please help me in formally justifying this and comment/modify above two
> justifications. Better if I you can suggent me to quote some reference of
> any suitable publication in this regard.
>
> best regards,
> Amjad Ali
>
> ___
> Beowulf mailing list, beow...@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] How to justify the use MPI codes on multicore systems/PCs?

2011-12-14 Thread Rayson Ho

There is a project called "MVAPICH2-GPU", which is developed by D. K.
Panda's research group at Ohio State University. You will find lots of
references on Google... and I just briefly gone through the slides of
"MVAPICH2-GPU: Optimized GPU to GPU Communication for InfiniBand
Clusters"":

http://nowlab.cse.ohio-state.edu/publications/conf-presentations/2011/hao-isc11-slides.pdf

It takes advantage of CUDA 4.0's Unified Virtual Addressing (UVA) to
pipeline & optimize cudaMemcpyAsync() & RMDA transfers. (MVAPICH
1.8a1p1 also supports Device-Device, Device-Host, Host-Device
transfers.)

Open MPI also supports similar functionality, but as OpenMPI is not an
academic project, there are less academic papers documenting the
internals of the latest developments (not saying that it's bad - many
products are not academic in nature and thus have less published
papers...)

Rayson

=
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/


On Mon, Dec 12, 2011 at 11:40 AM, Durga Choudhury  wrote:
> I think this is a *great* topic for discussion, so let me throw some
> fuel to the fire: the mechanism described in the blog (that makes
> perfect sense) is fine for (N)UMA shared memory architectures. But
> will it work for asymmetric architectures such as the Cell BE or
> discrete GPUs where the data between the compute nodes have to be
> explicitly DMA'd in? Is there a middleware layer that makes it
> transparent to the upper layer software?
>
> Best regards
> Durga
>
> On Mon, Dec 12, 2011 at 11:00 AM, Rayson Ho  wrote:
>> On Sat, Dec 10, 2011 at 3:21 PM, amjad ali  wrote:
>>> (2) The latest MPI implementations are intelligent enough that they use some
>>> efficient mechanism while executing MPI based codes on shared memory
>>> (multicore) machines.  (please tell me any reference to quote this fact).
>>
>> Not an academic paper, but from a real MPI library developer/architect:
>>
>> http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport/
>> http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport-part-2/
>>
>> Open MPI is used by Japan's K computer (current #1 TOP 500 computer)
>> and LANL's RoadRunner (#1 Jun 08 – Nov 09), and "10^16 Flops Can't Be
>> Wrong" and "10^15 Flops Can't Be Wrong":
>>
>> http://www.open-mpi.org/papers/sc-2008/jsquyres-cisco-booth-talk-2up.pdf
>>
>> Rayson
>>
>> =
>> Grid Engine / Open Grid Scheduler
>> http://gridscheduler.sourceforge.net/
>>
>> Scalable Grid Engine Support Program
>> http://www.scalablelogic.com/
>>
>>
>>>
>>>
>>> Please help me in formally justifying this and comment/modify above two
>>> justifications. Better if I you can suggent me to quote some reference of
>>> any suitable publication in this regard.
>>>
>>> best regards,
>>> Amjad Ali
>>>
>>> ___
>>> Beowulf mailing list, beow...@beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>
>>
>>
>> --
>> Rayson
>>
>> ==
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] Strange TCP latency results on Amazon EC2

2012-01-13 Thread Rayson Ho

On Tue, Jan 10, 2012 at 10:02 AM, Roberto Rey  wrote:
> I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
> hardware and I'm getting strange latency results with Netpipe and OpenMPI.

- There are 3 types of instances that can use 10 GbE. Are you using
"cc1.4xlarge", "cc2.8xlarge", or "cg1.4xlarge"??

- Did you set up a placement group??

- Also, which AMI are you using??

> I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
> outperforms raw TCP performance for small messages (40us of difference).
>
> Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
> optimization in BTL TCP?

It is indeed interesting!

If we can run strace with timing (like strace -tt) and compare the
difference between NPmpi & NPtcp, then we can get a better idea on
what's happening.

It is possible that one is doing more busy polling than another,
and/or triggering Xen to handle things a bit differently. Also, we
should check the socket options, and also check the system call
latency to see if the network is really accountable for the extra 40us
delay.

> The results for OpenMPI aren't so good but we must take into account the
> network virtualization overhead under Xen

If you are running Cluster Compute Instances, then you are using HVM.
If things are setup properly (HVM & placement group), then you can
even get a Top500 computer on EC2... Amazon uses similar setups for
their TOP500 submission:

http://i.top500.org/site/50321

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-01-31 Thread Rayson Ho

On Mon, Jan 30, 2012 at 11:33 PM, Tom Bryan  wrote:
> For our use, yes, spawn_multiple makes sense.  We won't be spawning lots and
> lots of jobs in quick succession.  We're using MPI as an robust way to get
> IPC as we spawn multiple child processes while using SGE to help us with
> load balancing our compute nodes.

Note that spawn_multiple is not going to buy you anything as SGE and
Open Grid Scheduler (and most other batch systems) do not handle
dynamic slot allocation. There is no way to change the number of slots
that are used by a job once it's running.

For this reason, I don't recall seeing any users using spawn_multiple
(and also, IIRC, the call was introduced in MPI-2)... and you might
want to make sure that normal MPI jobs work before debuging a
spawn_multiple() job.

Rayson

=
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/

>
>> Anyway:
>> do you see on the master node of the parallel job in:
>
> Yes, I should have included that kind of output.  I'll have to run it again
> with the cols option, but I used pstree to see that I have mpitest --child
> processes as children of orted by way of sge_shepherd and sge_execd.
>
> Thanks,
> ---Tom
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] ROMIO Podcast

2012-02-20 Thread Rayson Ho

Brock,

I listened to the podcast on Saturday, and I just downloaded it again
10 mins ago.

Did the interview really end at 26:34?? And if I recall correctly, you
& Jeff did not get a chance to ask them the "which source control
system do you guys use" question :-D

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/

On Mon, Feb 20, 2012 at 3:05 PM, Brock Palen  wrote:
> For those interested in MPI-IO, and ROMIO Jeff and I did an interview Rajeev 
> and Rob:
>
> http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] ROMIO Podcast

2012-02-20 Thread Rayson Ho

Hi Jeff,

I use wget to download the file - and I use VideoLAN to play the mp3.
VideoLAN shows that the file only has 26:34.

I just quickly tried to use Chrome to play the file, and it showed
that the file was over 33 mins. *However*, the podcast still ended at
26:34, after the ROMIO guys say "there are many MPI implementations,
SGI, and HP, and what..." - so am I the only one who gets a corrupted
file??

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/



On Mon, Feb 20, 2012 at 3:45 PM, Jeffrey Squyres  wrote:
> Little known secret: we edit before these things go to air.  :-)
>
> The recordings almost always take about an hour, but we snip some things out. 
>  IIRC, we had some tech problems which wasted some time in this recording, 
> and some off-recording kibitzing.  :-)
>
> Also, it looks like Brock had a problem with the XML so that iTunes/RSS 
> readers said the episode was 26:34.  But when you download it, the MP3 is 
> actually over 33 mins.  I think Brock just updated the RSS, so we'll see when 
> iTunes updates.
>
>
>
> On Feb 20, 2012, at 3:25 PM, Rayson Ho wrote:
>
>> Brock,
>>
>> I listened to the podcast on Saturday, and I just downloaded it again
>> 10 mins ago.
>>
>> Did the interview really end at 26:34?? And if I recall correctly, you
>> & Jeff did not get a chance to ask them the "which source control
>> system do you guys use" question :-D
>>
>> Rayson
>>
>> =
>> Open Grid Scheduler / Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> Scalable Grid Engine Support Program
>> http://www.scalablelogic.com/
>>
>>
>> On Mon, Feb 20, 2012 at 3:05 PM, Brock Palen  wrote:
>>> For those interested in MPI-IO, and ROMIO Jeff and I did an interview 
>>> Rajeev and Rob:
>>>
>>> http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html
>>>
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>>
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] ROMIO Podcast

2012-02-20 Thread Rayson Ho

Thanks, I just downloaded it again and it is not a corrupted file anymore!

(But what's to the "what source control system do you guys use" question? :-D )

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/


On Mon, Feb 20, 2012 at 4:47 PM, Brock Palen  wrote:
> This should be fixed, there was a bad upload, the server had a different copy 
> than my machine.  The fixed version is in place.  Feel free to grab it again.
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
>
>
>
> On Feb 20, 2012, at 4:43 PM, Jeffrey Squyres wrote:
>
>> Yes, something is borked here.  I just listened to what I got in iTunes and 
>> it's both longer than 33 mins (i.e., it keeps playing after the timer 
>> reaches 0:00), and then it cuts off in the middle of one of Rajeev's 
>> answers.  Doh.  :-(
>>
>> Brock is checking into it…
>>
>>
>> On Feb 20, 2012, at 4:37 PM, Rayson Ho wrote:
>>
>>> Hi Jeff,
>>>
>>> I use wget to download the file - and I use VideoLAN to play the mp3.
>>> VideoLAN shows that the file only has 26:34.
>>>
>>> I just quickly tried to use Chrome to play the file, and it showed
>>> that the file was over 33 mins. *However*, the podcast still ended at
>>> 26:34, after the ROMIO guys say "there are many MPI implementations,
>>> SGI, and HP, and what..." - so am I the only one who gets a corrupted
>>> file??
>>>
>>> Rayson
>>>
>>> =
>>> Open Grid Scheduler / Grid Engine
>>> http://gridscheduler.sourceforge.net/
>>>
>>> Scalable Grid Engine Support Program
>>> http://www.scalablelogic.com/
>>>
>>>
>>>
>>> On Mon, Feb 20, 2012 at 3:45 PM, Jeffrey Squyres  wrote:
>>>> Little known secret: we edit before these things go to air.  :-)
>>>>
>>>> The recordings almost always take about an hour, but we snip some things 
>>>> out.  IIRC, we had some tech problems which wasted some time in this 
>>>> recording, and some off-recording kibitzing.  :-)
>>>>
>>>> Also, it looks like Brock had a problem with the XML so that iTunes/RSS 
>>>> readers said the episode was 26:34.  But when you download it, the MP3 is 
>>>> actually over 33 mins.  I think Brock just updated the RSS, so we'll see 
>>>> when iTunes updates.
>>>>
>>>>
>>>>
>>>> On Feb 20, 2012, at 3:25 PM, Rayson Ho wrote:
>>>>
>>>>> Brock,
>>>>>
>>>>> I listened to the podcast on Saturday, and I just downloaded it again
>>>>> 10 mins ago.
>>>>>
>>>>> Did the interview really end at 26:34?? And if I recall correctly, you
>>>>> & Jeff did not get a chance to ask them the "which source control
>>>>> system do you guys use" question :-D
>>>>>
>>>>> Rayson
>>>>>
>>>>> =
>>>>> Open Grid Scheduler / Grid Engine
>>>>> http://gridscheduler.sourceforge.net/
>>>>>
>>>>> Scalable Grid Engine Support Program
>>>>> http://www.scalablelogic.com/
>>>>>
>>>>>
>>>>> On Mon, Feb 20, 2012 at 3:05 PM, Brock Palen  wrote:
>>>>>> For those interested in MPI-IO, and ROMIO Jeff and I did an interview 
>>>>>> Rajeev and Rob:
>>>>>>
>>>>>> http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html
>>>>>>
>>>>>> Brock Palen
>>>>>> www.umich.edu/~brockp
>>>>>> CAEN Advanced Computing
>>>>>> bro...@umich.edu
>>>>>> (734)936-1985
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ___
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> ___
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to: 
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> --
>>> ==
>>> Open Grid Scheduler - The Official Open Source Grid Engine
>>> http://gridscheduler.sourceforge.net/
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] ROMIO Podcast

2012-02-20 Thread Rayson Ho

Thanks, I just downloaded it again and it is not a corrupted file anymore!

(But what's happened to the "what source control system do you guys
use" question usually asked by Jeff? :-D )

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/

On Mon, Feb 20, 2012 at 4:47 PM, Brock Palen  wrote:
> This should be fixed, there was a bad upload, the server had a different copy 
> than my machine.  The fixed version is in place.  Feel free to grab it again.
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
>
>
>
> On Feb 20, 2012, at 4:43 PM, Jeffrey Squyres wrote:
>
>> Yes, something is borked here.  I just listened to what I got in iTunes and 
>> it's both longer than 33 mins (i.e., it keeps playing after the timer 
>> reaches 0:00), and then it cuts off in the middle of one of Rajeev's 
>> answers.  Doh.  :-(
>>
>> Brock is checking into it…
>>
>>
>> On Feb 20, 2012, at 4:37 PM, Rayson Ho wrote:
>>
>>> Hi Jeff,
>>>
>>> I use wget to download the file - and I use VideoLAN to play the mp3.
>>> VideoLAN shows that the file only has 26:34.
>>>
>>> I just quickly tried to use Chrome to play the file, and it showed
>>> that the file was over 33 mins. *However*, the podcast still ended at
>>> 26:34, after the ROMIO guys say "there are many MPI implementations,
>>> SGI, and HP, and what..." - so am I the only one who gets a corrupted
>>> file??
>>>
>>> Rayson
>>>
>>> =
>>> Open Grid Scheduler / Grid Engine
>>> http://gridscheduler.sourceforge.net/
>>>
>>> Scalable Grid Engine Support Program
>>> http://www.scalablelogic.com/
>>>
>>>
>>>
>>> On Mon, Feb 20, 2012 at 3:45 PM, Jeffrey Squyres  wrote:
>>>> Little known secret: we edit before these things go to air.  :-)
>>>>
>>>> The recordings almost always take about an hour, but we snip some things 
>>>> out.  IIRC, we had some tech problems which wasted some time in this 
>>>> recording, and some off-recording kibitzing.  :-)
>>>>
>>>> Also, it looks like Brock had a problem with the XML so that iTunes/RSS 
>>>> readers said the episode was 26:34.  But when you download it, the MP3 is 
>>>> actually over 33 mins.  I think Brock just updated the RSS, so we'll see 
>>>> when iTunes updates.
>>>>
>>>>
>>>>
>>>> On Feb 20, 2012, at 3:25 PM, Rayson Ho wrote:
>>>>
>>>>> Brock,
>>>>>
>>>>> I listened to the podcast on Saturday, and I just downloaded it again
>>>>> 10 mins ago.
>>>>>
>>>>> Did the interview really end at 26:34?? And if I recall correctly, you
>>>>> & Jeff did not get a chance to ask them the "which source control
>>>>> system do you guys use" question :-D
>>>>>
>>>>> Rayson
>>>>>
>>>>> =
>>>>> Open Grid Scheduler / Grid Engine
>>>>> http://gridscheduler.sourceforge.net/
>>>>>
>>>>> Scalable Grid Engine Support Program
>>>>> http://www.scalablelogic.com/
>>>>>
>>>>>
>>>>> On Mon, Feb 20, 2012 at 3:05 PM, Brock Palen  wrote:
>>>>>> For those interested in MPI-IO, and ROMIO Jeff and I did an interview 
>>>>>> Rajeev and Rob:
>>>>>>
>>>>>> http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html
>>>>>>
>>>>>> Brock Palen
>>>>>> www.umich.edu/~brockp
>>>>>> CAEN Advanced Computing
>>>>>> bro...@umich.edu
>>>>>> (734)936-1985
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ___
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> ___
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>

Re: [OMPI users] ROMIO Podcast

2012-02-20 Thread Rayson Ho

On Mon, Feb 20, 2012 at 6:02 PM, Jeffrey Squyres  wrote:
>> (But what's happened to the "what source control system do you guys
>> use" question usually asked by Jeff? :-D )
>
>
> I need to get back to asking that one.  :-)

Skynet needs to send Jeff (and Arnold) back in time!

> It's just a personal curiosity of mine; that's really the only reason I ask.

BTW, since most of the interviewees are opensource project
maintainers, next time can you ask them how much external contribution
they get (%), and who are the main external contributors (students?
HPC labs? Industry?), and how do they handle external contributions
(need copyright assignment?). And how do they handle testing, and
performance regression...

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/

> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] ROMIO Podcast

2012-02-21 Thread Rayson Ho

On Tue, Feb 21, 2012 at 12:06 PM, Rob Latham  wrote:
> ROMIO's testing and performance regression framework is honestly a
> shambles.  Part of that is a challenge with the MPI-IO interface
> itself.  For MPI messaging you exercise the API and you have pretty
> much covered everything.  MPI-IO, though, introduces hints.  These
> hints are great for tuning but make the testing "surface area" a lot
> larger.  We are probably going to have a chance to improve things
> greatly with some recently funded proposals.

Thanks for the replies Rob.

I am interested in testing mainly because not a lot of projects have
spare clusters lying around for performance regression testing. But
then these days we can get machines from EC2 easily & relatively
cheaply, so I was wondering if other projects are migrating their test
infrastructure to EC2.

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/


>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE

2012-03-15 Thread Rayson Ho

Hi Joshua,

I don't think the new built-in rsh in later versions of Grid Engine is
going to make any difference - the orted is the real starter of the
MPI tasks and should have a greater influence on the task environment.

However, it would help if you can record the nice values and resource
limits of each of the MPI task - you can easily do so by using a shell
wrapper like this one:


#!/bin/sh

# resource limit
ulimit -a > /tmp/mpijob.$$

# nice value
ps -eo pid,user,nice,command | grep $$

# run real executable


exit $?


Use mpirun to submit it as if it is the real MPI application - then
you can see if there are limits introduced by Grid Engine that are
causing issues...

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/



On Thu, Mar 15, 2012 at 12:28 AM, Joshua Baker-LePain  wrote:
> On Thu, 15 Mar 2012 at 12:44am, Reuti wrote
>
>
>> Which version of SGE are you using? The traditional rsh startup was
>> replaced by the builtin startup some time ago (although it should still
>> work).
>
>
> We're currently running the rather ancient 6.1u4 (due to the "If it ain't
> broke..." philosophy).  The hardware for our new queue master recently
> arrived and I'll soon be upgrading to the most recent Open Grid Scheduler
> release.  Are you saying that the upgrade with the new builtin startup
> method should avoid this problem?
>
>
>> Maybe this shows already the problem: there are two `qrsh -inherit`, as
>> Open MPI thinks these are different machines (I ran only with one slot on
>> each host hence didn't get it first but can reproduce it now). But for SGE
>> both may end up in the same queue overriding the openmpi-session in $TMPDIR.
>>
>> Although it's running: you get all output? If I request 4 slots and get
>> one from each queue on both machines the mpihello outputs only 3 lines: the
>> "Hello World from Node 3" is always missing.
>
>
> I do seem to get all the output -- there are indeed 64 Hello World lines.
>
> Thanks again for all the help on this.  This is one of the most productive
> exchanges I've had on a mailing list in far too long.
>
>
> --
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] Error while loading shared libraries

2012-04-02 Thread Rayson Ho

On Sun, Apr 1, 2012 at 11:27 PM, Rohan Deshpande  wrote:
>   error while loading shared libraries: libmpi.so.0: cannot open shared
> object file no such object file: No such file or directory.

Were you trying to run the MPI program on a remote machine?? If you
are, then make sure that each machine has the libraries installed (or
you can install Open MPI on an NFS directory).

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/


>
> When I run using - mpirun -np 1 ldd hello the following libraries are not
> found
>   1. libmpi.so.0
>   2. libopen-rte.so.0
>   3. libopen.pal.so.0
>
> I am using openmpi version 1.4.5. Also PATH and LD_LIBRARY_PATH variables
> are correctly set and 'which mpicc' returns correct path
>
> Any help would be highly appreciated.
>
> Thanks
>
>
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Sharing (not copying) data with OpenMPI?

2012-04-17 Thread Rayson Ho

On Tue, Apr 17, 2012 at 2:26 AM, jody  wrote:
> As to OpenMP: i already make use of OpenMP in some places (for
> instance for the creation of the large data block),
> but unfortunately my main application is not well suited for OpenMP
> parallelization..

If MPI does not support this kind of programming, you can always write
the logic in your application... MPI tasks are normal & real processes
just like any other processes in the system.

Do something like:

1. open a file in /tmp exclusively - which means only one MPI task on
each machine can get the "lock".

2. the one that gets the "lock" creates a shared memory segment &
loads in the fileset.

3. communicate with other MPI tasks on the machine (eg. read from a
file, or whatever that is easy) and let them know about the memory
segment.

It's really 20 - 50 lines of C or C++ code - may not be the prettiest
architecture, but in the end the MPI library is doing something very
similar internally.

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/




>
> I guess i'll have to take more detailed look at my problem to see if i
> can restructure it in a good way...
>
> Thank You
>  Jody
>
>
> On Mon, Apr 16, 2012 at 11:16 PM, Brian Austin  wrote:
>> Maybe you meant to search for OpenMP instead of Open-MPI.
>> You can achieve something close to what you want by using OpenMP for on-node
>> parallelism and MPI for inter-node communication.
>> -Brian
>>
>>
>>
>> On Mon, Apr 16, 2012 at 11:02 AM, George Bosilca 
>> wrote:
>>>
>>> No currently there is no way in MPI (and subsequently in Open MPI) to
>>> achieve this. However, in the next version of the MPI standard there will be
>>> a function allowing processes to shared a memory segment
>>> (https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/284).
>>>
>>> If you like living on the bleeding edge, you can try Brian's branch
>>> implementing the MPI 3.0 RMA operations (including the shared memory
>>> segment) from http://svn.open-mpi.org/svn/ompi/tmp-public/mpi3-onesided/.
>>>
>>>  george.
>>>
>>> On Apr 16, 2012, at 09:52 , jody wrote:
>>>
>>> > Hi
>>> >
>>> > In my application i have to generate a large block of data (several
>>> > gigs) which subsequently has to be accessed by all processes (read
>>> > only),
>>> > Because of its size, it would take quite some time to serialize and
>>> > send the data to the different processes. Furthermore, i risk
>>> > running out of memory if this data is instantiated more than once on
>>> > one machine.
>>> >
>>> > Does OpenMPI offer some way of sharing data between processes (on the
>>> > same machine) without needing to send (and therefore copy) it?
>>> >
>>> > Or would i have to do this by means of creating shared memory, writing
>>> > to it, and then make it accessible for reading by the processes?
>>> >
>>> > Thank You
>>> >  Jody
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] ppe-ompi 1.2 (Open MPI on EC2)

2012-04-23 Thread Rayson Ho

Is StarCluster too complex for your use case?

http://web.mit.edu/star/cluster/

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/



On Mon, Apr 23, 2012 at 6:20 PM, Barnet Wagman  wrote:
> I've released a new version of ppe-ompi, which is a system for running an
> Open MPI network on EC2 (Amazon's cloud computing service).  Download here.
>
>
> Release notes:
>
> AMIs with Open MPI 1.4.5 are available.
> EBS volumes can be attached to instances using the ec2 network manager.
> A simple ssh shell for access to an ec2 instance is built into the ec2
> network manager.
> Amazon client configuration parameters can be set in the ec2 network
> manager. Some of the parameters may be needed to access AWS if you are using
> a proxy server.
> The network specification manager shows more information about instance
> types and AMIs.
>
>
> Regards,
>
> Barnet Wagman
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] HRM problem

2012-04-24 Thread Rayson Ho

Seems like there's a bug in the application. Did you or someone else
write it, or did you get it from an ISV??

You can log onto one of the nodes, attach a debugger, and see if the
MPI task is waiting for a message (looping in one of the MPI receive
functions)...

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/


On Tue, Apr 24, 2012 at 12:49 AM, Syed Ahsan Ali  wrote:
> Dear All,
>
> I am having problem with running an application on Dell cluster . The model
> starts well but no further progress is shown. It just stuck. I have checked
> the systems, no apparent hardware error is there. Other open mpi
> applications are running well on the same cluster. I have tried running the
> application on cores of the same server as well but the problem is same. The
> application just don't move further. The same application is also running
> well on a backup cluster. Please help.
>
>
> Thanks and Best Regards
>
> Ahsan
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] MPI books and resources

2012-05-12 Thread Rayson Ho

And before you try to understand the OMPI code, read some of the
papers & presentations first:

http://www.open-mpi.org/papers/

Rayson


Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/


On Sat, May 12, 2012 at 12:42 PM, Constantinos Makassikis
 wrote:
> You may be interested by :
>
> - the MPI Standard (http://www.mpi-forum.org/docs/docs.html)
>
> - the book chapter written by J. Squyres in (see
> http://www.open-mpi.org/community/lists/devel/2012/05/10981.php)
>
> and
>
> - a Scuba Diving Kit ... to dive into ... open source code :-D
>
>
> --
> Constantinos
>
>
>
>
> On Sat, May 12, 2012 at 12:18 PM, Rohan Deshpande 
> wrote:
>>
>> Hi,
>>
>> Can anyone point me to good resources and books to understand the detailed
>> architecture and working of MPI.
>>
>> I would like to know all the minute details.
>>
>> Thanks,
>>
>> --
>>
>>
>>
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

[OMPI users] OT: MPI Quiz...

2012-06-01 Thread Rayson Ho

We posted an MPI quiz but so far no one on the Grid Engine list has
the answer that Jeff was expecting:

 http://blogs.scalablelogic.com/

Others have offered interesting points, and I just want to see if
people on the Open MPI list have the *exact* answer and the first one
gets a full Cisco Live Conference Pass (list price = $2195)!

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Re: [OMPI users] Question on ./configure error on Tru64unix (OSF1) v5.1B-6 for openmpi-1.6

2012-06-08 Thread Rayson Ho

Hi Bill,

If you *really* have time, then you can go deep into the log, and find
out why configure failed. It looks like configure failed when it tried
to compile this code:

 .text
 # .gsym_test_func
 .globl .gsym_test_func
 .gsym_test_func:
 # .gsym_test_func

 configure:26752: result: none
 configure:26756: error: Could not determine global symbol label prefix

May be it's a gcc thing?? Like your assembler is too old?? I tried it
in Cygwin, which has gcc 3.4.4, and it seems to work fine (just copy
the 5 lines of code above into a file and name it with the ".s" ext
name. Then compile it with gcc and see if you can reproduce it.

I was involved in a TOP500 project that uses AlphaServer SC ES45 nodes
(a total of 4,096 cores), and it was the #2 in TOP500 a decade ago! It
was fun back then... But I agree with Jeff, it is unlikely that Open
MPI is going to work on Tru64 - all modern processors are much faster
than Alpha and I believe even the TOP500 Alpha machines are all
powered down (even the Earth Simulator is not on the TOP500 list
anymore - that was the #1 back then!!).

Rayson

On Fri, Jun 8, 2012 at 7:07 AM, Jeff Squyres  wrote:
> To be honest, I don't think we've ever tested on Tru64, so I'm not surprised 
> that it doesn't work.  Indeed, I think that it is unlikely that we will ever 
> support Tru64.  :-(
>
> Sorry!
>
>
> On Jun 7, 2012, at 12:43 PM,   
> wrote:
>
>>
>> Hello,
>>
>> I am having trouble with the *** Assembler section of the GNU autoconf
>> step in trying to build OpenMPI version 1.6 on an HP AlphaServer GS160
>> running Tru64unix version 5.1B-6:
>>
>> # uname -a
>> OSF1 zozma.cts.cwu.edu V5.1 2650 alpha
>>
>> The output is of the ./configure run
>> zozma(bash)% ./configure --prefix=/usr/local/OpenMPI \
>> --enable-shared --enable-static :
>>
>> ...
>>
>> *** Assembler
>> checking dependency style of gcc... gcc3
>> checking for BSD- or MS-compatible name lister (nm)... /usr/local/bin/nm -B
>> checking the name lister (/usr/local/bin/nm -B) interface... BSD nm
>> checking for fgrep... /usr/local/bin/grep -F
>> checking if need to remove -g from CCASFLAGS... no
>> checking whether to enable smp locks... yes
>> checking if .proc/endp is needed... no
>> checking directive for setting text section... .text
>> checking directive for exporting symbols... .globl
>> checking for objdump... objdump
>> checking if .note.GNU-stack is needed... no
>> checking suffix for labels... :
>> checking prefix for global symbol labels... none
>> configure: error: Could not determine global symbol label prefix
>>
>> The ./config.log is appended.
>>
>> Can anyone provide some information or suggestions on how to resolve this
>> issue?
>>

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

http://blogs.scalablelogic.com/

Re: [OMPI users] Can't read more than 2^31 bytes with MPI_File_read, regardless of type?

2012-08-07 Thread Rayson Ho

I originally thought that it was an issue related to 32-bit
executables, but it seems to affect 64-bit as well...

I found references to this problem -- it was reported back in 2007:

http://lists.mcs.anl.gov/pipermail/mpich-discuss/2007-July/002600.html

If you look at the code, you will find that MPI_File_read() calls the
special I/O driver implementation if one's available, but if not then
there's also the generic ad_ufs device (POSIX) implementation.

IIRC, SciNet is using IBM GPFS (BTW, a few years ago when Chris gave
me a tour of the machine room at MP, the cluster he was managing was
using Lustre). Since there is no specific implementation for GPFS,
then ROMIO would default back to ad_ufs, and calls
ADIOI_GEN_ReadContig().

In ADIOI_GEN_ReadContig(), we have code:

ADIO_Offset len;

len  = (ADIO_Offset)datatype_size * (ADIO_Offset)count;

And ADIO_Offset is typdef'ed to MPI_Offset, which is 64-bit on 64-bit.
So far so good.

However, the way len is used... hmm, can be an issue:

ADIOI_Assert(len == (unsigned int) len); /* read takes an unsigned
int parm */

...

err = read(fd->fd_sys, buf, (unsigned int)len);

So wait... read takes an unsigned int?? From the manpage:

   ssize_t read(int fd, void *buf, size_t count);

size_t is not unsigned int... it could be if it is 32-bit, but not
when we are LP64.

Other places in ompi/mca/io/romio/romio/mpi-io/read.c also need to be
updated (those are really easy as they are sanity checks). But at
least someone can try to fix the root cause by changing 2 lines of
code mentioned above, or the ROMIO guys can comment on why an unsigned
int should be passed to read(2)... (Internally, the file offset
(fp_sys_posn) is of type ADIO_Offset, so it should be fine.)

However, I've only spent less than 2 hours on this as I found it
interesting -- 12 years ago I was fixing 32-bit file offset issues in
a supercomputer middleware company, and there are still issues with
32-bit vs 64-bit file pointers today! :-O So I guess 30 years from now
when we run out of space of 64-bit, we will be fixing 32-bit, 64-bit
offset issues for 128-bit applications (that's when we have quantum
computers!)! :-D . Also take the suggestions above at your own risk!
(And I still need to read the "An Abstract-Device Interface for
Implementing Portable Parallel-I/O Interfaces" to understand more
about the internal structures of ROMIO!)

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

On Tue, Aug 7, 2012 at 6:02 PM, Richard Shaw  wrote:
> On Tuesday, 7 August, 2012 at 12:21 PM, Rob Latham wrote:
>> Hi. Known problem in the ROMIO MPI-IO implementation (which OpenMPI
>> uses). Been on my list of "things to fix" for a while.
>
> Ok, thanks. I'm glad it's not just us.
>
> Is there a timescale for this being fixed? Because if it's a long term thing, 
> I would suggest it might be worth putting a FAQ entry on it or something 
> similar? Especially as it's quite contradictory to most peoples 
> interpretation of the specification. Maybe it's already listed as a known 
> problem somewhere, and I just missed it - it took quite a while before I 
> stopped thinking it was an issue with my code.
>
> Is there a better workaround than just splitting the MPI_File_read up into 
> multiple reads of  <2^31 bytes? We're actually trying to read in a 
> distributed array, and the workaround awkwardly requires the creation and 
> reading of multiple darray types, each designed to read in the correct number 
> of blocks less than 2^31 bytes. This seems like it could be a bit fragile.
>
> Thanks again,
> Richard
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

http://blogs.scalablelogic.com/

Re: [OMPI users] issue with column type in language C

2012-08-19 Thread Rayson Ho

Hi Christian,

The code you posted is very similar to another school assignment sent
to this list 2 years ago:

http://www.open-mpi.org/community/lists/users/2010/10/14619.php

At that time, the code was written in Fortran, and now it is written
in C - however, the variable names, logic, etc are quite similar! :-D

Remember, debugging and bug fix is part of the (home) work!

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


On Sun, Aug 19, 2012 at 12:59 AM, Christian Perrier
 wrote:
> Hi,
>
> I have a problem with MPI_Senrecv communication where I send columns on
> edges between processes.
> For debugging, I show you below a basic example where I initialize a 10x10
> matrix ("x0" array) with x_domain=4
> and y_domain=4. For the test, I simply initialize the 2D array values with
> x0[i][j] = i+j . After, in updateBound.c", I'm
> using the MPI_Sendrecv functions for the North-South and Est-West process.
>
> Here's the main program "example.c" :
>
> ---
>
> #include 
> #include 
> #include 
> #include "mpi.h"
>
> int main(int argc, char *argv[])
> {
>   /* size of the discretization */
>
>   double** x;
>   double** x0;
>
>   int i,j,k,l;
>   int nproc;
>   int ndims;
>   int S=0, E=1, N=2, W=3;
>   int NeighBor[4];
>   int xcell, ycell, size_tot_x, size_tot_y;
>   int *xs,*ys,*xe,*ye;
>   int size_x = 4;
>   int size_y = 4;
>   int me;
>   int x_domains=2;
>   int y_domains=2;
>
>   MPI_Comm comm, comm2d;
>   int dims[2];
>   int periods[2];
>   int reorganisation = 0;
>   int row;
>   MPI_Datatype column_type;
>
>
>
>   size_tot_x=size_x+2*x_domains+2;
>   size_tot_y=size_y+2*y_domains+2;
>
>   xcell=(size_x/x_domains);
>   ycell=(size_y/y_domains);
>
>   MPI_Init(&argc, &argv);
>   comm = MPI_COMM_WORLD;
>   MPI_Comm_size(comm,&nproc);
>   MPI_Comm_rank(comm,&me);
>
>   x = malloc(size_tot_y*sizeof(double*));
>   x0 = malloc(size_tot_y*sizeof(double*));
>
>
>   for(j=0;j<=size_tot_y-1;j++) {
> x[j] = malloc(size_tot_x*sizeof(double));
> x0[j] = malloc(size_tot_x*sizeof(double));
>   }
>
>   xs = malloc(nproc*sizeof(int));
>   xe = malloc(nproc*sizeof(int));
>   ys = malloc(nproc*sizeof(int));
>   ye = malloc(nproc*sizeof(int));
>
>   /* Create 2D cartesian grid */
>   periods[0] = 0;
>   periods[1] = 0;
>
>   ndims = 2;
>   dims[0]=x_domains;
>   dims[1]=y_domains;
>
>   MPI_Cart_create(comm, ndims, dims, periods, reorganisation, &comm2d);
>
>   /* Identify neighbors */
>
>   NeighBor[0] = MPI_PROC_NULL;
>   NeighBor[1] = MPI_PROC_NULL;
>   NeighBor[2] = MPI_PROC_NULL;
>   NeighBor[3] = MPI_PROC_NULL;
>
>   /* Left/West and right/Est neigbors */
>
>   MPI_Cart_shift(comm2d,0,1,&NeighBor[W],&NeighBor[E]);
>
>   /* Bottom/South and Upper/North neigbors */
>
>   MPI_Cart_shift(comm2d,1,1,&NeighBor[S],&NeighBor[N]);
>
>   /* coordinates of current cell with me rank */
>
>   xcell=(size_x/x_domains);
>   ycell=(size_y/y_domains);
>
>   ys[me]=(y_domains-me%(y_domains)-1)*(ycell+2)+2;
>   ye[me]=ys[me]+ycell-1;
>
>   for(i=0;i<=y_domains-1;i++)
>   {xs[i]=2;}
>
>   for(i=0;i<=y_domains-1;i++)
>   {xe[i]=xs[i]+xcell-1;}
>
>   for(i=1;i<=(x_domains-1);i++)
>  { for(j=0;j<=(y_domains-1);j++)
>   {
>xs[i*y_domains+j]=xs[(i-1)*y_domains+j]+xcell+2;
>xe[i*y_domains+j]=xs[i*y_domains+j]+xcell-1;
>   }
>  }
>
>   for(i=0;i<=size_tot_y-1;i++)
>   { for(j=0;j<=size_tot_x-1;j++)
> { x0[i][j]= i+j;
> //  printf("%f\n",x0[i][j]);
> }
>   }
>
>   /*  Create column data type to communicate with South and North
> neighbors */
>
>   MPI_Type_vector( ycell, 1, size_tot_x, MPI_DOUBLE, &column_type);
>   MPI_Type_commit(&column_type);
>
>   updateBound(x0, NeighBor, comm2d, column_type, me, xs, ys, xe, ye,
> xcell);
>
>
>   for(i=0;i<=size_tot_y-1;i++)
>{
> free(x[i]);
> free(x0[i]);
>}
>
> free(x);
> free(x0);
>
> free(xs);
> free(xe);
> free(ys);
> free(ye);
>
> MPI_Finalize();
>
> return 0;
> }
> ---
>
> and the second file "updateBound.c" which sends the columns and rows
>
>
> ---
>
>
> #include "mpi.h"
> #include 
>
> /***/
> /*Update Bounds of subdomain with me process

Re: [OMPI users] 2 GB limitation of MPI_File_write_all

2012-10-20 Thread Rayson Ho

Hi Eric,

Sounds like it's also related to this problem reported by Scinet back in July:

http://www.open-mpi.org/community/lists/users/2012/07/19762.php

And I think I found the issue, but I still have not followed up with
the ROMIO guys yet. And I was not sure if Scinet was waiting for the
fix or not - next time I visit U of Toronto, I will see if I can visit
the Scinet office and meet with the Scinet guys!

http://www.open-mpi.org/community/lists/users/2012/08/19907.php

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


On Fri, Oct 19, 2012 at 4:45 PM, Gus Correa  wrote:
> Hi Eric
>
> Have you tried to create a user-defined MPI type
> (say MPI_Type_Contiguous or MPI_Type_Vector) and pass them
> to the MPI function calls, instead of MPI_LONGs?
> Then you could use the new type and the new number
> (i.e., an integer number smaller than "size", and
> smaller than the maximum integer 2,147,483,647 )
> in the MPI function calls (e.g., MPI_File_write_all).
> Maybe the "invalid argument" error message relates to this.
> If I remember right, the 'number of elements' in MPI calls
> is a positive integer (int, 32 bits).
>
> See these threads about this workaround:
>
> http://www.open-mpi.org/community/lists/users/2009/02/8100.php
> http://www.open-mpi.org/community/lists/users/2010/11/14816.php
>
> Also, not MPI but C.
> I wonder if you need to declare "size" as 'long int',
> or maybe 'long long int', to represent/hold correctly
> the large value that you want
> (360,000,000,000 > 2,147,483,647).
>
> I hope this helps,
> Gus Correa
>
>
> On 10/19/2012 02:31 PM, Eric Chamberland wrote:
>>
>> Hi,
>>
>> I get this error when trying to write 360 000 000 000 MPI_LONG:
>>
>> with Openmpi-1.4.5:
>> ERROR Returned by MPI_File_write_all: 35
>> ERROR_string Returned by MPI_File_write_all: MPI_ERR_IO: input/output
>> error
>>
>> with Openmpi-1.6.2:
>> ERROR Returned by MPI_File_write_all: 13
>> ERROR_string Returned by MPI_File_write_all: MPI_ERR_ARG: invalid
>> argument of some other kind
>>
>> First, the error in 1.6.2 seems to be less usefull to understand what
>> happened for the user...
>>
>> Second, am I wrong to try to write that much MPI_LONG? Is this
>> limitation documented or to be fixed?
>>
>> Thanks,
>>
>> Eric
>>
>> =
>> Here is the code:
>>
>> #include 
>> #include "mpi.h"
>>
>> int main (int argc, char *argv[])
>> {
>> MPI_Datatype filetype;
>> MPI_File fh;
>> long *local_array;
>> MPI_Status status;
>>
>> MPI_Init( &argc, &argv );
>>
>> int nb_proc = 0;
>> MPI_Comm_size( MPI_COMM_WORLD, &nb_proc );
>> if (nb_proc != 1) {
>> printf( "Test code for 1 process!\n" );
>> MPI_Abort( MPI_COMM_WORLD, 1 );
>> }
>> int size=9000*4;
>> local_array = new long[size];
>>
>> MPI_File_open(MPI_COMM_WORLD, "2.6Gb",
>> MPI_MODE_CREATE | MPI_MODE_WRONLY,
>> MPI_INFO_NULL, &fh);
>>
>> int ierr = MPI_File_write_all(fh, local_array, size, MPI_LONG, &status);
>> if (ierr != MPI_SUCCESS) {
>> printf("ERROR Returned by MPI_File_write_all: %d\n",ierr);
>> char* lCharPtr = new char[MPI_MAX_ERROR_STRING];
>> int lLongueur = 0;
>> MPI_Error_string(ierr,lCharPtr, &lLongueur);
>> printf("ERROR_string Returned by MPI_File_write_all: %s\n",lCharPtr);
>> MPI_Abort( MPI_COMM_WORLD, 1 );
>> }
>>
>> MPI_File_close(&fh);
>>
>> MPI_Finalize();
>> return 0;
>> }
>>
>> ~
>>
>> ~
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] OpenMPI on Windows when MPI_F77 is used from a C application

2012-10-29 Thread Rayson Ho

Mathieu,

Can you include the small C program you wrote??

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


On Mon, Oct 29, 2012 at 12:08 PM, Damien  wrote:
> Mathieu,
>
> Where is the crash?  Without that info, I'd suggest turning off all the
> optimisations and just compile it without any flags other than what you need
> to compile it cleanly (so no /O flags) and see if it crashes.
>
> Damien
>
>
> On 26/10/2012 10:27 AM, Mathieu Gontier wrote:
>
> Dear all,
>
> I am willing to use OpenMPI on Windows for a CFD instead of  MPICH2. My
> solver is developed if Fortran77 and piloted by a C++ interface; the both
> levels call MPI functions.
>
> So, I installed OpenMPI-1.6.2-x64 on my system and compiled my code
> successfully. But, at the runtime it crashed.
> I reproduced the problem into a small C application calling a Fortran
> function using MPI_Allreduce; when I removed some aggressive optimization
> options from the Fortran, it worked:
>
> Optimization: Disable (/Od)
>
> Inline Function Expansion: Any Suitable (/Ob2)
>
> Favor Size or Speed: Favor Fast Code (/Ot)
>
>
> So, I removed the same options from the Fortran parts of my solver, but it
> still crashes. I tried some others, but it still continues crashing. Does
> anybody has an idea? Should I (de)activate some compilation options? Is
> there some properties to build and link against libmpi_f77.lib?
>
> Thanks for your help.
> Mathieu.
>
> --
> Mathieu Gontier
> - MSN: mathieu.gont...@gmail.com
> - Skype: mathieu_gontier
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] configuration openMPI problem

2012-11-23 Thread Rayson Ho

If you read the log, you will find:

./configure: line 5373: icc: command not found
configure:5382: $? = 127
configure:5371: icc -v >&5

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


On Fri, Nov 23, 2012 at 1:45 PM, Diego Avesani  wrote:
> dear all,
> I am new in openMPI world and in general in parallelization. I have some
> problem with configuration of openMPI in my laptop.
> I have read your FAQ and I tried to google the problem but I was not able to
> solve it.
> The problem is:
>
> I have downloaded the openmpi-1.6.3, unpacked it
> Then I have installed on my pc intel icc and icpc.
> when I run:
> ./configure CC=icc CXX=icpc F77=ifort FC=ifort
>
> I get:
>
> *** Startup tests
> checking build system type... x86_64-unknown-linux-gnu
> checking host system type... x86_64-unknown-linux-gnu
> checking target system type... x86_64-unknown-linux-gnu
> checking for gcc... icc
> checking whether the C compiler works... no
> configure: error: in `/home/diedro/Downloads/openmpi-1.6.3':
> configure: error: C compiler cannot create executables
> See `config.log' for more details
> diedro@diedro-Latitude-E6420:~/Desktop/Downloads/openmpi-1.6.3$
>
> I do no understand why. I did a simple hello project with icc and it works.
> (in attachment you can fiend the config.log)
>
> Really thanks for any help.
>
>
> Diego
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html

Re: [OMPI users] EXTERNAL: Re: How is hwloc used by OpenMPI

2012-11-23 Thread Rayson Ho

On Thu, Nov 8, 2012 at 11:07 AM, Jeff Squyres  wrote:
> Correct.  PLPA was a first attempt at a generic processor affinity solution.  
> hwloc is a 2nd generation, much Much MUCH better solution than PLPA (we 
> wholly killed PLPA
> after the INRIA guys designed hwloc).

Edwin,

We ported OGS/Grid Engine to hwloc 1.5 years ago (the original core
binding code in Grid Engine uses PLPA).

http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

>From an API consumer (both PLPA & hwloc) point of view, some of the
important hwloc advantages are:

1) Grid Engine now can use the same piece of code for different
platforms: Linux, Solaris, AIX, Mac OS X, FreeBSD, Tru64, HP-UX,
Windows. Before with PLPA, we only have support for Linux & Solaris.

2) Support for newer CPU architectures & hardware. As the development
of PLPA stopped a few years ago, many of the newer architectures did
not get recognized properly. We switched over to hwloc when the
original Grid Engine core binding code stopped working on the AMD
Magny-Cours (Opteron 6100 series).

To be fair to PLPA, had the development continued, then it should have
no issues with those new architectures. But then, the data structures
of hwloc seem to be able to handle newer hardware components more
nicely!

We now use information from hwloc to optimize job placement on AMD
Bulldozers (including Piledriver). Currently hwloc just treats each of
the Bulldozer module as 2 cores, so we still have to code a bit of
logic in the Grid Engine code to do what we need.

http://blogs.scalablelogic.com/2012/07/optimizing-grid-engine-for-amd.html

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

>
>> Re: layering, I believe you are saying that the relationship to libnuma is 
>> not one where hwloc is adding higher-level functionalities to libnuma, but 
>> rather hwloc is a much improved alternative except for a few system calls it 
>> makes via libnuma out of necessity or convenience.
>
> Correct.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html

Re: [OMPI users] configuration openMPI problem

2012-11-24 Thread Rayson Ho

In your shell, run:

export PATH=$PATH

And then rerun configure again with the original parameters again - it
should find icc & ifort this time.

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


On Fri, Nov 23, 2012 at 9:22 PM, Diego Avesani  wrote:
> hi,
> thank for your replay.
>
> I currently use ifort to compile my program. I write also a hello program
> for icc and it works.
> After that I have run
>
> ./configure --prefix=/usr/local
>
> without specified any compiler and it seem to work. Now I have a ompi-1.6
> folder in my opt folder. A question: Can I now compile a program with
> openmpi and intel fortran compiler?
> if yes do you know some good tutorial
>
> again thank for you time
>
>
> Diego
>
>
>
>
>
> On 23 November 2012 20:45, Ralph Castain  wrote:
>>
>> I believe what it is telling you is that icc is not in your PATH. Please
>> check that icc, icpc, and ifort are all in your PATH.
>>
>>
>> On Nov 23, 2012, at 11:35 AM, Diego Avesani 
>> wrote:
>>
>> dear all,
>> thanks for the replay,
>>
>>./configure: line 5373: icc: command not found
>> configure:5382: $? = 127
>> configure:5371: icc -v >&5
>>
>> I am totally new, What can I do? As I told you if I compile a simple hello
>> program with icc
>> it works.
>>
>> Thanks
>>
>> Diego
>>
>>
>>
>>
>> On 23 November 2012 15:45, Diego Avesani  wrote:
>>>
>>> dear all,
>>> I am new in openMPI world and in general in parallelization. I have some
>>> problem with configuration of openMPI in my laptop.
>>> I have read your FAQ and I tried to google the problem but I was not able
>>> to solve it.
>>> The problem is:
>>>
>>> I have downloaded the openmpi-1.6.3, unpacked it
>>> Then I have installed on my pc intel icc and icpc.
>>> when I run:
>>> ./configure CC=icc CXX=icpc F77=ifort FC=ifort
>>>
>>> I get:
>>>
>>> *** Startup tests
>>> checking build system type... x86_64-unknown-linux-gnu
>>> checking host system type... x86_64-unknown-linux-gnu
>>> checking target system type... x86_64-unknown-linux-gnu
>>> checking for gcc... icc
>>> checking whether the C compiler works... no
>>> configure: error: in `/home/diedro/Downloads/openmpi-1.6.3':
>>> configure: error: C compiler cannot create executables
>>> See `config.log' for more details
>>> diedro@diedro-Latitude-E6420:~/Desktop/Downloads/openmpi-1.6.3$
>>>
>>> I do no understand why. I did a simple hello project with icc and it
>>> works.
>>> (in attachment you can fiend the config.log)
>>>
>>> Really thanks for any help.
>>>
>>>
>>> Diego
>>>
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] configuration openMPI problem

2012-11-24 Thread Rayson Ho

That's what Google is for! You can very easily find lots of examples
by Google Searching: mpi+fortran+examples

Like this one: 
http://www.dartmouth.edu/~rc/classes/intro_mpi/hello_world_ex.html

Or this one, with C & Fortran examples side by side:
https://computing.llnl.gov/tutorials/mpi/

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


On Sat, Nov 24, 2012 at 10:00 AM, Diego Avesani  wrote:
> Dear Rayson and all,
>
> I run only with iFort and the compile works, I use only ifort.
> Now I have folder with OPT. If it works now and it is ok use only iFort what
> can I do to learn?
> I mean where can I find a good tutorial or hello project in fortran. I have
> found something for c but nothing about fortran.
>
> Thanks again
>
> Diego
>
>
>
>
>
> On 24 November 2012 03:32, Rayson Ho  wrote:
>>
>> In your shell, run:
>>
>> export PATH=$PATH
>>
>> And then rerun configure again with the original parameters again - it
>> should find icc & ifort this time.
>>
>> Rayson
>>
>> ==
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>>
>> On Fri, Nov 23, 2012 at 9:22 PM, Diego Avesani 
>> wrote:
>> > hi,
>> > thank for your replay.
>> >
>> > I currently use ifort to compile my program. I write also a hello
>> > program
>> > for icc and it works.
>> > After that I have run
>> >
>> > ./configure --prefix=/usr/local
>> >
>> > without specified any compiler and it seem to work. Now I have a
>> > ompi-1.6
>> > folder in my opt folder. A question: Can I now compile a program with
>> > openmpi and intel fortran compiler?
>> > if yes do you know some good tutorial
>> >
>> > again thank for you time
>> >
>> >
>> > Diego
>> >
>> >
>> >
>> >
>> >
>> > On 23 November 2012 20:45, Ralph Castain  wrote:
>> >>
>> >> I believe what it is telling you is that icc is not in your PATH.
>> >> Please
>> >> check that icc, icpc, and ifort are all in your PATH.
>> >>
>> >>
>> >> On Nov 23, 2012, at 11:35 AM, Diego Avesani 
>> >> wrote:
>> >>
>> >> dear all,
>> >> thanks for the replay,
>> >>
>> >>./configure: line 5373: icc: command not found
>> >> configure:5382: $? = 127
>> >> configure:5371: icc -v >&5
>> >>
>> >> I am totally new, What can I do? As I told you if I compile a simple
>> >> hello
>> >> program with icc
>> >> it works.
>> >>
>> >> Thanks
>> >>
>> >> Diego
>> >>
>> >>
>> >>
>> >>
>> >> On 23 November 2012 15:45, Diego Avesani 
>> >> wrote:
>> >>>
>> >>> dear all,
>> >>> I am new in openMPI world and in general in parallelization. I have
>> >>> some
>> >>> problem with configuration of openMPI in my laptop.
>> >>> I have read your FAQ and I tried to google the problem but I was not
>> >>> able
>> >>> to solve it.
>> >>> The problem is:
>> >>>
>> >>> I have downloaded the openmpi-1.6.3, unpacked it
>> >>> Then I have installed on my pc intel icc and icpc.
>> >>> when I run:
>> >>> ./configure CC=icc CXX=icpc F77=ifort FC=ifort
>> >>>
>> >>> I get:
>> >>>
>> >>> *** Startup tests
>> >>> checking build system type... x86_64-unknown-linux-gnu
>> >>> checking host system type... x86_64-unknown-linux-gnu
>> >>> checking target system type... x86_64-unknown-linux-gnu
>> >>> checking for gcc... icc
>> >>> checking whether the C compiler works... no
>> >>> configure: error: in `/home/diedro/Downloads/openmpi-1.6.3':
>> >>> configure: error: C compiler cannot create executables
>> >>> See `config.log' for more details
>> >>> diedro@diedro-Latitude-E6420:~/Desktop/Downloads/openmpi-1.6.3$
>> >>>
>> >>> I do no understand why. I did a simple hello project with icc and it
>> >>> works.
>> >>> (in attachment you can fiend the config.log)
>> >>>
>> >>> Really thanks for any help.
>> >>>
>> >>>
>> >>> Diego
>> >>>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> http://blogs.scalablelogic.com/2012/11/running-1-node-grid-engine-cluster.html

Re: [OMPI users] [OMPI devel] processor affinity -- OpenMPI / batch system integration

2009-10-22 Thread Rayson Ho

The code for the Job to Core Binding (aka. thread binding, or CPU
binding) feature was checked into the Grid Engine project cvs. It uses
OpenMPI's Portable Linux Processor Affinity (PLPA) library, and is
topology and NUMA aware.

The presentation from HPC Software Workshop '09:
http://wikis.sun.com/download/attachments/170755116/job2core.pdf

The design doc:
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213897

Initial support is planned for 6.2 update 5 (current release is update
4, so update 5 is likely to be released in the next 2 or 3 months).

Rayson



On Tue, Sep 30, 2008 at 2:23 PM, Ralph Castain  wrote:
> Note that we would also have to modify OMPI to:
>
> 1. recognize these environmental variables, and
>
> 2. use them to actually set the binding, instead of using OMPI-internal
> directives
>
> Not a big deal to do, but not something currently in the system. Since we
> launch through our own daemons (something that isn't likely to change in
> your time frame), these changes would be required.
>
> Otherwise, we could come up with some method by which you could provide
> mapper information we use. While I agree with Jeff that having you tell us
> which cores to use for each rank would generally be better, it does raise
> issues when users want specific mapping algorithms that you might not
> support. For example, we are working on mappers that will take input from
> the user regarding comm topology plus system info on network wiring topology
> and generate a near-optimal mapping of ranks. As part of that, users may
> request some number of cores be reserved for that rank for threading or
> other purposes.
>
> So perhaps both  options would be best - give us the list of cores available
> to us so we can map and do affinity, and pass in your own mapping. Maybe
> with some logic so we can decide which to use based on whether OMPI or GE
> did the mapping??
>
> Not sure here - just thinking out loud.
> Ralph
>
> On Sep 30, 2008, at 12:58 PM, Jeff Squyres wrote:
>
>> On Sep 30, 2008, at 2:51 PM, Rayson Ho wrote:
>>
>>> Restarting this discussion. A new update version of Grid Engine 6.2
>>> will come out early next year [1], and I really hope that we can get
>>> at least the interface defined.
>>
>> Great!
>>
>>> At the minimum, is it enough for the batch system to tell OpenMPI via
>>> an env variable which core (or virtual core, in the SMT case) to start
>>> binding the first MPI task?? I guess an added bonus would be
>>> information about the number of processors to skip (the stride)
>>> between the sibling tasks?? Stride of one is usually the case, but
>>> something larger than one would allow the batch system to control the
>>> level of cache and memory bandwidth sharing between the MPI tasks...
>>
>> Wouldn't it be better to give us a specific list of cores to bind to?  As
>> core counts go up in servers, I think we may see a re-emergence of having
>> multiple MPI jobs on a single server.  And as core counts go even *higher*,
>> then fragmentation of available cores over time is possible/likely.
>>
>> Would you be giving us a list of *relative* cores to bind to (i.e., "bind
>> to the Nth online core on the machine" -- which may be different than the
>> OS's ID for that processor) or will you be giving us the actual OS virtual
>> processor ID(s) to bind to?
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI users] [OMPI devel] processor affinity -- OpenMPI / batch system integration

2009-10-22 Thread Rayson Ho

Yes, on page 14 of the presentation: "Support for OpenMPI and OpenMP
Through -binding [pe|env] linear|striding" -- SGE performs no binding,
but instead it outputs the binding decision to OpenMPI.

Support for OpenMPI's binding is part of the "Job to Core Binding" project.

Rayson



On Thu, Oct 22, 2009 at 10:16 AM, Ralph Castain  wrote:
> Hi Rayson
>
> You're probably aware: starting with 1.3.4, OMPI will detect and abide by
> external bindings. So if grid engine sets a binding, we'll follow it.
>
> Ralph
>
> On Oct 22, 2009, at 9:03 AM, Rayson Ho wrote:
>
>> The code for the Job to Core Binding (aka. thread binding, or CPU
>> binding) feature was checked into the Grid Engine project cvs. It uses
>> OpenMPI's Portable Linux Processor Affinity (PLPA) library, and is
>> topology and NUMA aware.
>>
>> The presentation from HPC Software Workshop '09:
>> http://wikis.sun.com/download/attachments/170755116/job2core.pdf
>>
>> The design doc:
>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213897
>>
>> Initial support is planned for 6.2 update 5 (current release is update
>> 4, so update 5 is likely to be released in the next 2 or 3 months).
>>
>> Rayson
>>
>>
>>
>> On Tue, Sep 30, 2008 at 2:23 PM, Ralph Castain  wrote:
>>>
>>> Note that we would also have to modify OMPI to:
>>>
>>> 1. recognize these environmental variables, and
>>>
>>> 2. use them to actually set the binding, instead of using OMPI-internal
>>> directives
>>>
>>> Not a big deal to do, but not something currently in the system. Since we
>>> launch through our own daemons (something that isn't likely to change in
>>> your time frame), these changes would be required.
>>>
>>> Otherwise, we could come up with some method by which you could provide
>>> mapper information we use. While I agree with Jeff that having you tell
>>> us
>>> which cores to use for each rank would generally be better, it does raise
>>> issues when users want specific mapping algorithms that you might not
>>> support. For example, we are working on mappers that will take input from
>>> the user regarding comm topology plus system info on network wiring
>>> topology
>>> and generate a near-optimal mapping of ranks. As part of that, users may
>>> request some number of cores be reserved for that rank for threading or
>>> other purposes.
>>>
>>> So perhaps both  options would be best - give us the list of cores
>>> available
>>> to us so we can map and do affinity, and pass in your own mapping. Maybe
>>> with some logic so we can decide which to use based on whether OMPI or GE
>>> did the mapping??
>>>
>>> Not sure here - just thinking out loud.
>>> Ralph
>>>
>>> On Sep 30, 2008, at 12:58 PM, Jeff Squyres wrote:
>>>
>>>> On Sep 30, 2008, at 2:51 PM, Rayson Ho wrote:
>>>>
>>>>> Restarting this discussion. A new update version of Grid Engine 6.2
>>>>> will come out early next year [1], and I really hope that we can get
>>>>> at least the interface defined.
>>>>
>>>> Great!
>>>>
>>>>> At the minimum, is it enough for the batch system to tell OpenMPI via
>>>>> an env variable which core (or virtual core, in the SMT case) to start
>>>>> binding the first MPI task?? I guess an added bonus would be
>>>>> information about the number of processors to skip (the stride)
>>>>> between the sibling tasks?? Stride of one is usually the case, but
>>>>> something larger than one would allow the batch system to control the
>>>>> level of cache and memory bandwidth sharing between the MPI tasks...
>>>>
>>>> Wouldn't it be better to give us a specific list of cores to bind to?
>>>>  As
>>>> core counts go up in servers, I think we may see a re-emergence of
>>>> having
>>>> multiple MPI jobs on a single server.  And as core counts go even
>>>> *higher*,
>>>> then fragmentation of available cores over time is possible/likely.
>>>>
>>>> Would you be giving us a list of *relative* cores to bind to (i.e.,
>>>> "bind
>>>> to the Nth online core on the machine" -- which may be different than
>>>> the
>>>> OS's ID for that processor) or will you be giving us the actual OS
>>>> virtual
>>>> processor ID(s) to bind to?
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-08 Thread Rayson Ho

If you are using instance types that support SR-IOV (aka. "enhanced
networking" in AWS), then turn it on. We saw huge differences when SR-IOV
is enabled

http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html

Make sure you start your instances with a placement group -- otherwise, the
instances can be data centers apart!

And check that jumbo frames are enabled properly:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

But still, it is interesting that Intel MPI is getting a 2X speedup with
the same setup! Can you post the raw numbers so that we can take a deeper
look??

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html

On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L. 
wrote:

>
> I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about
> half the performance for MPI over TCP as I do with raw TCP. Before I start
> digging in to this more deeply, does anyone know what might cause that?
>
> For what it's worth, I see the same issues with MPICH, but I do not see it
> with Intel MPI.
>
> --
> Gary Jackson
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28659.php
>

Re: [OMPI users] Why do I need a C++ linker while linking in MPI C code with CUDA?

2016-03-21 Thread Rayson Ho

On Sun, Mar 20, 2016 at 10:37 PM, dpchoudh .  wrote:

> I'd tend to agree with Gilles. I have written CUDA programs in pure C
> (i.e. neither involving MPI nor C++) and a pure C based tool chain builds
> the code successfully. So I don't see why CUDA should be intrinsically C++.
>

nvcc calls the C++ compiler for the non-CUDA compilation steps:

http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/

So even if you don't have any C++ code, the host code is still compiled by
g++, which *usually* is free to insert calls to the C++ runtime.

* A few years ago I worked on a C++ project but we were not allowed to link
against the C++ library... And there is actually a way to tell g++ not to
generate calls to the C++ runtime. However, it is not something as easy as
flipping a switch so you will just have to linked against the std++ lib. :-)

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html

>
> From the Makefile (that I had attached in my previous mail) the only CUDA
> library being linked against is this:
>
> /usr/local/cuda/lib64/libcudart.so
> and ldd on that shows this:
>
> [durga@smallMPI lib64]$ ldd libcudart.so
> linux-vdso.so.1 =>  (0x7ffe1e7f1000)
> libc.so.6 => /lib64/libc.so.6 (0x7ff7e4493000)
> libdl.so.2 => /lib64/libdl.so.2 (0x7ff7e428f000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7ff7e4072000)
> librt.so.1 => /lib64/librt.so.1 (0x7ff7e3e6a000)
> /lib64/ld-linux-x86-64.so.2 (0x7ff7e4af3000)
>
> I don't see any C++ dependency here either.
>
> And finally, I don't think there is any version issue. This is a clean
> CUDA 7.5 install directly from NVIDIA CUDA repo (for Redhat) and all
> provided examples run fine with this installation.
>
> I believe there are NVIDIA employees in this list; hopefully one of them
> will clarify.
>
> Thanks
> Durga
>
> Life is complex. It has real and imaginary parts.
>
> On Sun, Mar 20, 2016 at 10:23 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> I am a bit puzzled...
>>
>> if only cuda uses the c++ std libraries, then it should depend on them
>> (ldd libcudaxyz.so can be used to confirm that)
>> and then linking with cuda lib should pull the c++ libs
>>
>> could there be a version issue ?
>> e.g. the missing symbol is not provided by the version of the c++
>> lib that is pulled.
>> that might occur if you are using cuda built for distro X on distro Y
>>
>> could you please double check this ?
>> if everything should work, then i recommend you report this to nvidia
>>
>> Cheers,
>>
>> Gilles
>>
>> On Monday, March 21, 2016, Damien Hocking  wrote:
>>
>>> Durga,
>>>
>>> The Cuda libraries use the C++ std libraries.  That's the std::ios_base
>>> errors.. You need the C++ linker to bring those in.
>>>
>>> Damien
>>>
>>> On March 20, 2016 9:15:47 AM "dpchoudh ."  wrote:
>>>
 Hello all

 I downloaded some code samples from here:

 https://github.com/parallel-forall/code-samples/

 and tried to build the subdirectory

 posts/cuda-aware-mpi-example/src

 in my CentOS 7 machine.

 I had to make several changes to the Makefile before it would build.
 The modified Makefile is attached (the make targets I am talking about are
 the 3rd and 4th from the bottom). Most of the modifications can be
 explained as possible platform specific variations (such as path
 differences betwen Ubuntu and CentOS), except the following:

 I had to use a C++ linker (mpic++) to link in the object files that
 were produced with C host compiler (mpicc) and CUDA compiler (nvcc). If I
 did not do this, (i.e. I stuck to mpicc for linking), I got the following
 link error:

 mpicc -L/usr/local/cuda/lib64 -lcudart -lm -o
 ../bin/jacobi_cuda_normal_mpi jacobi.o input.o host.o device.o
 cuda_normal_mpi.o
 device.o: In function `__static_initialization_and_destruction_0(int,
 int)':
 tmpxft_4651_-4_Device.cudafe1.cpp:(.text+0xd1e): undefined
 reference to `std::ios_base::Init::Init()'
 tmpxft_4651_-4_Device.cudafe1.cpp:(.text+0xd2d): undefined
 reference to `std::ios_base::Init::~Init()'
 collect2: error: ld returned 1 exit status

 Can someone please explain why would I need a C++ linker for object
 files that were generated using C compiler? Note that if I use mpic++ both
 for compiling and linking, there are no errors either.

 Thanks in advance
 Durga

 Life is complex. It has real and imaginary parts.
 ___
 users mailing list
 us...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
 Link to this post:
 http://www.open-mpi.org/community/lists/users/2016/03/28760.php

37 matches

Mail list logo