Srinivas,
There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
if you can checkpoint an MPI task and restart it on a new node, then
this is also "process migration".
Of course, doing a checkpoint & restart can be slower than pure
in-kernel process migration, but the advantage is
t;
> Thanks and regards
> Durga
>
>
> On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho wrote:
>> Srinivas,
>>
>> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
>> if you can checkpoint an MPI task and restart it on a new node, then
>> th
On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain wrote:
> OMPI has no way of knowing that you will turn the node on at some future
> point. All it can do is try to launch the job on the provided node, which
> fails because the node doesn't respond.
> You'll have to come up with some scheme for telli
Hi Xin,
Since it is not Open MPI specific, you might want to try to work with
the SciNet guys first. The "SciNet Research Computing Consulting
Clinic" is specifically formed to help U of T students & researchers
develop and design compute-intensive programs.
http://www.scinet.utoronto.ca/
http://
Did you notice the error message:
/usr/bin/install: cannot remove
`/opt/openmpi/share/openmpi/amca-param-sets/example.conf': Permission
denied
I would check the permission settings of the file first if I encounter
something like this...
Rayson
=
Grid Engine / O
You can use a debugger (just gdb will do, no TotalView needed) to find
out which MPI send & receive calls are hanging the code on the
distributed cluster, and see if the send & receive pair is due to a
problem described at:
Deadlock avoidance in your MPI programs:
http://www.cs.ucsb.edu/~hnielsen/
We are using hwloc-1.2.2 for topology binding in Open Grid
Scheduler/Grid Engine 2011.11, and a user is encountering similar
issues:
http://gridengine.org/pipermail/users/2011-December/002126.html
In Open MPI, there is the configure switch "--without-libnuma" to turn
libnuma off. But since Open M
On Sat, Dec 10, 2011 at 3:21 PM, amjad ali wrote:
> (2) The latest MPI implementations are intelligent enough that they use some
> efficient mechanism while executing MPI based codes on shared memory
> (multicore) machines. (please tell me any reference to quote this fact).
Not an academic paper
the compute nodes have to be
> explicitly DMA'd in? Is there a middleware layer that makes it
> transparent to the upper layer software?
>
> Best regards
> Durga
>
> On Mon, Dec 12, 2011 at 11:00 AM, Rayson Ho wrote:
>> On Sat, Dec 10, 2011 at 3:21 PM, amjad ali wrote:
On Tue, Jan 10, 2012 at 10:02 AM, Roberto Rey wrote:
> I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
> hardware and I'm getting strange latency results with Netpipe and OpenMPI.
- There are 3 types of instances that can use 10 GbE. Are you using
"cc1.4xlarge", "cc2.8xla
On Mon, Jan 30, 2012 at 11:33 PM, Tom Bryan wrote:
> For our use, yes, spawn_multiple makes sense. We won't be spawning lots and
> lots of jobs in quick succession. We're using MPI as an robust way to get
> IPC as we spawn multiple child processes while using SGE to help us with
> load balancing
Brock,
I listened to the podcast on Saturday, and I just downloaded it again
10 mins ago.
Did the interview really end at 26:34?? And if I recall correctly, you
& Jeff did not get a chance to ask them the "which source control
system do you guys use" question :-D
Rayson
> iTunes updates.
>
>
>
> On Feb 20, 2012, at 3:25 PM, Rayson Ho wrote:
>
>> Brock,
>>
>> I listened to the podcast on Saturday, and I just downloaded it again
>> 10 mins ago.
>>
>> Did the interview really end at 26:34?? And if I recall corr
ins (i.e., it keeps playing after the timer
>> reaches 0:00), and then it cuts off in the middle of one of Rajeev's
>> answers. Doh. :-(
>>
>> Brock is checking into it…
>>
>>
>> On Feb 20, 2012, at 4:37 PM, Rayson Ho wrote:
>>
>>> H
t; it's both longer than 33 mins (i.e., it keeps playing after the timer
>> reaches 0:00), and then it cuts off in the middle of one of Rajeev's
>> answers. Doh. :-(
>>
>> Brock is checking into it…
>>
>>
>> On Feb 20, 2012, at 4:37 PM, Rayson Ho w
On Mon, Feb 20, 2012 at 6:02 PM, Jeffrey Squyres wrote:
>> (But what's happened to the "what source control system do you guys
>> use" question usually asked by Jeff? :-D )
>
>
> I need to get back to asking that one. :-)
Skynet needs to send Jeff (and Arnold) back in time!
> It's just a perso
On Tue, Feb 21, 2012 at 12:06 PM, Rob Latham wrote:
> ROMIO's testing and performance regression framework is honestly a
> shambles. Part of that is a challenge with the MPI-IO interface
> itself. For MPI messaging you exercise the API and you have pretty
> much covered everything. MPI-IO, thou
Hi Joshua,
I don't think the new built-in rsh in later versions of Grid Engine is
going to make any difference - the orted is the real starter of the
MPI tasks and should have a greater influence on the task environment.
However, it would help if you can record the nice values and resource
limits
On Sun, Apr 1, 2012 at 11:27 PM, Rohan Deshpande wrote:
> error while loading shared libraries: libmpi.so.0: cannot open shared
> object file no such object file: No such file or directory.
Were you trying to run the MPI program on a remote machine?? If you
are, then make sure that each machine
On Tue, Apr 17, 2012 at 2:26 AM, jody wrote:
> As to OpenMP: i already make use of OpenMP in some places (for
> instance for the creation of the large data block),
> but unfortunately my main application is not well suited for OpenMP
> parallelization..
If MPI does not support this kind of progra
Is StarCluster too complex for your use case?
http://web.mit.edu/star/cluster/
Rayson
=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/
Scalable Grid Engine Support Program
http://www.scalablelogic.com/
On Mon, Apr 23, 2012 at 6:20 PM,
Seems like there's a bug in the application. Did you or someone else
write it, or did you get it from an ISV??
You can log onto one of the nodes, attach a debugger, and see if the
MPI task is waiting for a message (looping in one of the MPI receive
functions)...
Rayson
==
And before you try to understand the OMPI code, read some of the
papers & presentations first:
http://www.open-mpi.org/papers/
Rayson
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/
Scalable Grid Engine Support Program
http://www.scalable
We posted an MPI quiz but so far no one on the Grid Engine list has
the answer that Jeff was expecting:
http://blogs.scalablelogic.com/
Others have offered interesting points, and I just want to see if
people on the Open MPI list have the *exact* answer and the first one
gets a full Cisco Live C
Hi Bill,
If you *really* have time, then you can go deep into the log, and find
out why configure failed. It looks like configure failed when it tried
to compile this code:
.text
# .gsym_test_func
.globl .gsym_test_func
.gsym_test_func:
# .gsym_test_func
configure:26752: result: none
conf
I originally thought that it was an issue related to 32-bit
executables, but it seems to affect 64-bit as well...
I found references to this problem -- it was reported back in 2007:
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2007-July/002600.html
If you look at the code, you will find tha
Hi Christian,
The code you posted is very similar to another school assignment sent
to this list 2 years ago:
http://www.open-mpi.org/community/lists/users/2010/10/14619.php
At that time, the code was written in Fortran, and now it is written
in C - however, the variable names, logic, etc are qu
Hi Eric,
Sounds like it's also related to this problem reported by Scinet back in July:
http://www.open-mpi.org/community/lists/users/2012/07/19762.php
And I think I found the issue, but I still have not followed up with
the ROMIO guys yet. And I was not sure if Scinet was waiting for the
fix or
Mathieu,
Can you include the small C program you wrote??
Rayson
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
On Mon, Oct 29, 2012 at 12:08 PM, Damien wrote:
> Mathieu,
>
> Where is the crash
If you read the log, you will find:
./configure: line 5373: icc: command not found
configure:5382: $? = 127
configure:5371: icc -v >&5
Rayson
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge
On Thu, Nov 8, 2012 at 11:07 AM, Jeff Squyres wrote:
> Correct. PLPA was a first attempt at a generic processor affinity solution.
> hwloc is a 2nd generation, much Much MUCH better solution than PLPA (we
> wholly killed PLPA
> after the INRIA guys designed hwloc).
Edwin,
We ported OGS/Grid
In your shell, run:
export PATH=$PATH
And then rerun configure again with the original parameters again - it
should find icc & ifort this time.
Rayson
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.n
, I use only ifort.
> Now I have folder with OPT. If it works now and it is ok use only iFort what
> can I do to learn?
> I mean where can I find a good tutorial or hello project in fortran. I have
> found something for c but nothing about fortran.
>
> Thanks again
>
> Diego
>
- give us the list of cores available
> to us so we can map and do affinity, and pass in your own mapping. Maybe
> with some logic so we can decide which to use based on whether OMPI or GE
> did the mapping??
>
> Not sure here - just thinking out loud.
> Ralph
>
> On Sep 30
Rayson
On Thu, Oct 22, 2009 at 10:16 AM, Ralph Castain wrote:
> Hi Rayson
>
> You're probably aware: starting with 1.3.4, OMPI will detect and abide by
> external bindings. So if grid engine sets a binding, we'll follow it.
>
> Ralph
>
> On Oct 22, 2009, at
If you are using instance types that support SR-IOV (aka. "enhanced
networking" in AWS), then turn it on. We saw huge differences when SR-IOV
is enabled
http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cl
On Sun, Mar 20, 2016 at 10:37 PM, dpchoudh . wrote:
> I'd tend to agree with Gilles. I have written CUDA programs in pure C
> (i.e. neither involving MPI nor C++) and a pure C based tool chain builds
> the code successfully. So I don't see why CUDA should be intrinsically C++.
>
nvcc calls the C
37 matches
Mail list logo