On Tue, Mar 23, 2010 at 1:25 PM, fengguang tian wrote:
> now, I set $HOME as shared directory, but when doing ompi-checkpoint, it
> shows:(nimbus1 is the remote machine in
> my cluster)
>
> [nimbus1:12630] opal_os_dirpath_create: Error: Unable to create the
> sub-directory (/home/mpiu/ompi_global_
I met the same problem with this link:
http://www.open-mpi.org/community/lists/users/2009/12/11374.php
in the link, they give a solution that use v1.4 open mpi instead of v1.3
open mpi. but, I am using v1.7a1r22794 open mpi, and met the same problem.
here is what I have done:
my cluster composed o
Nicolas Niclausse wrote:
Fernando Lemos ecrivait le 23/03/2010 16:28:
I'm trying to run openmpi (1.4.1) on two clusters; on each cluster, several
interfaces are private;
on cluster1, nodes have 3 interfaces, and only 192.168.159.0/24 is visible
from cluster2.
chicon-3
eth0 inet addr:192
On Tue, Mar 23, 2010 at 12:55 PM, fengguang tian wrote:
>
> I use mpirun -np 50 -am ft-enable-cr --mca snapc_base_global_snapshot_dir
> --hostfile .mpihostfile
> to store the global checkpoint snapshot into the shared
> directory:/mirror,but the problems are still there,
> when ompi-checkpoin
Fernando Lemos ecrivait le 23/03/2010 16:28:
>> I'm trying to run openmpi (1.4.1) on two clusters; on each cluster, several
>> interfaces are private;
>>
>> on cluster1, nodes have 3 interfaces, and only 192.168.159.0/24 is visible
>> from cluster2.
>>
>> chicon-3
>> eth0 inet addr:192.168.160.
now, I set $HOME as shared directory, but when doing ompi-checkpoint, it
shows:(nimbus1 is the remote machine in
my cluster)
[nimbus1:12630] opal_os_dirpath_create: Error: Unable to create the
sub-directory (/home/mpiu/ompi_global_snapshot_1662.ckpt/0) of
(/home/mpiu/ompi_global_snapshot_1662.ckpt
Hello,
I am still using LAM/MPI on an old cluster and wonder if I can get
some help from this mail list. Here is the problem. I am using a 18
node cluster, each node has 2 CPU and each CPU supports up to 2
threads. So I assume I can use 18*4 number of processors. As running
the following code, an e
I use mpirun -np 50 -am ft-enable-cr --mca snapc_base_global_snapshot_dir
--hostfile .mpihostfile
to store the global checkpoint snapshot into the shared
directory:/mirror,but the problems are still there,
when ompi-checkpoint, the mpirun is still not killed,it is hanging
there.when doing ompi
OK,thank you. I will try to move the checkpoint file into the shared
directory
Regards
fengguang
On Tue, Mar 23, 2010 at 10:34 AM, Fernando Lemos wrote:
> On Tue, Mar 23, 2010 at 12:27 PM, fengguang tian
> wrote:
> > I have created the shared file system. but I created a /mirror at root
> > dir
On Tue, Mar 23, 2010 at 12:24 PM, fengguang tian wrote:
> Hi
>
> I am using open-mpi and blcr in a cluster of 3 machines, and the checkpoint
> and restart work fine in single machine,but when doing checkpoint in
> clusters environment, the ompi-checkpoint hangs
Besdies what has been said in anoth
On Tue, Mar 23, 2010 at 12:27 PM, fengguang tian wrote:
> I have created the shared file system. but I created a /mirror at root
> directory,not at the $HOME directory,is that the
> problem? thank you
Others might be able to give you more a accurate explanation. The way
I understood it, in OpenMP
On Tue, Mar 23, 2010 at 10:25 AM, Nicolas Niclausse
wrote:
> Hello,
>
>
> I'm trying to run openmpi (1.4.1) on two clusters; on each cluster, several
> interfaces are private;
>
> on cluster1, nodes have 3 interfaces, and only 192.168.159.0/24 is visible
> from cluster2.
>
> chicon-3
> eth0 in
I have created the shared file system. but I created a /mirror at root
directory,not at the $HOME directory,is that the
problem? thank you
cheers
fengguang
On Tue, Mar 23, 2010 at 10:23 AM, Fernando Lemos wrote:
> On Mon, Mar 22, 2010 at 8:20 PM, fengguang tian
> wrote:
> > I set up a cluster o
Hi
I am using open-mpi and blcr in a cluster of 3 machines, and the checkpoint
and restart work fine in single machine,but when doing checkpoint in
clusters environment, the ompi-checkpoint hangs
for example
my clusters composed of 3 machines, and using NFS, has a shared directory.
in master node
On Mon, Mar 22, 2010 at 8:20 PM, fengguang tian wrote:
> I set up a cluster of 18 nodes using Open MPI and BLCR library, and the MPI
> program runs well on the clusters,
> but how to checkpoint the MPI program on this clusters?
> for example:
> here is what I do for a test:
> mpiu@nimbus: /mirror$
Hello,
I'm trying to run openmpi (1.4.1) on two clusters; on each cluster, several
interfaces are private;
on cluster1, nodes have 3 interfaces, and only 192.168.159.0/24 is visible
from cluster2.
chicon-3
eth0 inet addr:192.168.160.76 Bcast:192.168.160.255 Mask:255.255.255.0
eth1 ine
Hi Gilles,
It has been fixed, could you update your source and do a clean build?
Actually, mpiexec is the same as mpirun, and mpic++ is the wrapper for
C++ applications. You could find more details in the OMPI documentations
here: http://www.open-mpi.org/doc/v1.4/
Regards,
Shiqing
Bloom
Hi All,
I am writing to you for Packt Publishing, the publishers of computer related
books.
We are planning to extend our catalogue of books based on Scientific Computing
Tools and are currently inviting authors interested in writing for Packt. This
doesn't need any previous writing experien
18 matches
Mail list logo