I cannot interpret the raw core files since they are specific your
system and setup. Can you run it through gdb and get a backtrace? "gdb
hello core.1234" then use the 'bt' command from inside gdb.
That will help me start to focus in on the problem.
Cheers,
Josh
On Oct 8, 2008, at 10:22 PM, arun dhakne wrote:
I have configured with the additional flags(--enable-ft-thread
--enable-mpi-threads) but there is no change in behaviour, it still
gives seg fault.
open mpi version:
Open MPI: 1.3a1r19685
blcr version:
version 0.7.3
The core file is attached.
hello.c is sample mpi program whose core is dumped is also attached.
~]$ ompi-restart ompi_global_snapshot_11219.ckpt
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 11288 on node
acl-cadi-pentd-1.cse.buffalo.edu exited on signal 11 (Segmentation
fault).
--------------------------------------------------------------------------
2 total processes killed (some possibly by mpirun during cleanup)
Best,
On Mon, Oct 6, 2008 at 6:44 PM, Josh Hursey <jjhur...@open-mpi.org>
wrote:
The installation looks ok, though I'm not sure what is causing the
segfault
of the restarted process. Two things to try. First can you send me a
backtrace from the core file that is generated from the
segmentation fault.
That will provide insight into what is causing it.
Second you may try to enable the C/R thread which allows for a
checkpoint to
progress when an application is in a computation loop instead of
only when
it is in the MPI library. To do so configure with these additional
flags:
--enable-ft-thread --enable-mpi-threads
What version of Open MPI are you using? What version of BLCR?
Best,
Josh
On Oct 6, 2008, at 3:55 PM, arun dhakne wrote:
Hi all,
This is the procedure i have followed to install openmpi. Is there
some installation or environment setting problem in here?
an openmpi program with 4 process is run across 2 dual-core intel
machines, with 2 processes running on each of the machine.
ompi-checkpoint is successful but ompi-restart fails with
following error
$:> ompi-restart ompi_global_snapshot_6045.ckpt
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 6372 on node
acl-cadi-pentd-1.cse.buffalo.edu exited on signal 11 (Segmentation
fault).
--------------------------------------------------------------------------
Open-mpi installation steps:
./configure --prefix=/home/csgrad/audhakne/.openmpi --with-ft=cr
--with-blcr=/usr/lib64 --enable-debug
make
make install
export
LD_LIBRARY_PATH=$HOME/.openmpi/lib/:$HOME/.openmpi/lib/openmpi:/
usr/lib64
export PATH=$HOME/.openmpi/bin:$PATH
NOTE: blcr is installed as a module
$:> lsmod | grep blcr
blcr 117892 0
blcr_vmadump 58264 1 blcr
blcr_imports 46080 2 blcr,blcr_vmadump
Please let me know if there is problem with above procedure,
thanks a
lot for your time.
Best.
---------- Forwarded message ----------
From: arun dhakne <arundha...@gmail.com>
Date: Tue, Sep 30, 2008 at 12:52 AM
Subject: ompi-restart issue : ompi-restart doesn't work across nodes
To: Open MPI Users <us...@open-mpi.org>
Hi all,
I had gone through some previous ompi-restart issues but i couldn't
find anything similar to this problem.
I have installed blcr, and configured open-mpi 'openmpi-1.3a1r19645'
i) If the sample mpi program say ( np 4 on single machine that is
without any hostfile )is ran and I try to checkpoint it, it happens
successfully and even ompi-restart works in this case.
ii) If the sample mpi program is ran across say 2 different nodes
and
checkpoint happens successfully BUT ompi-restart throws following
error:
$ ompi-restart ompi_global_snapshot_7604.ckpt
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 9590 on node
acl-cadi-pentd-1.cse.buffalo.edu exited on signal 11 (Segmentation
fault).
--------------------------------------------------------------------------
Please let me know if more information is needed.
--
Thanks and Regards,
Arun U. Dhakne
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Thanks and Regards,
Arun U. Dhakne
Graduate Student
Computer Science and Engineering Dept.
State University of New York at Buffalo
<core.tar.gz><hello.c>