Josh,
At this moment I´m working in the uncoordinated checkpoint, and
probably I´ll have some tools to collect data from the process and
environment and probably from the application.
About the application I´m considering the possibility to do something
like this (MPI_Checkpoint??).
Leonardo
mca_base_component_distill_checkpoint_ready=0
ft_cr_enabled=1
crs=
rml_wrapper=ftrm
snapc=single (similar to full but do a checkpoint of only one process)
filem=rsh
pml_wrapper=crcpw
crcp=uncoord (similar to coord but need to do checkpoint of only one
process)
btl=tcp,self
Thanks,
Leonardo Fialho
--
Leonardo Fialho
Computer Architecture
Sure! :) Thank you so much!
Leonardo
Aurélien Bouteiller escribió:
Hi,
to enable the vprotocol pessimist, you have to specify -mca vprotocol
pessimist. This parameter takes precedence on the priority. Let me
know if you hit success :]
Aurelien
Le 5 mars 08 à 13:55, Leonardo Fialho a
, like a child of the original process.
When a run an application with this version and take a checkpoint
manually, I have no problem...
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos
Phone
file descriptor
Thanks,
Leonardo Fialho
Leonardo Fialho escribió:
Hi All,
Does anybody experiment this error?
[aogrdini:09070] Global) Receive a command message from [[13242,0],0].
...
[aogrd02:07642] Local) Receive a command message.
...
[aogrd01:07938] Local) Receive a command me
returns the "bad file descriptor" (EBAFD) error,
and the blcr module don´t catch this error, only return (-1) "child failed".
Thanks,
Leonardo Fialho
Josh Hursey escribió:
I don't think I have ever seen this one before. :(
So you are trying to checkpoint the MPI pro
ze growing all the time.
I think that it is something in the CRCP module/component...
Thanks,
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-9
simple-ping 20 1
-- Josh
On May 29, 2008, at 7:54 AM, Leonardo Fialho wrote:
Hi All,
I made some tests with a dummy "ping" application. Some memory
problems occurred. On these tests I obtained the following results:
1) OpenMPI (without FT):
- delaying 1 second to send token to
1.3)
[lfialho@aoclsp ~]$
Thanks,
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478
ons are:
A) The execution time in case "1" should be smaller (only sm
communication, no?) than case "2" and "3", no? Cache problems?
B) Why the "sys" time while using communication inter nodes? NIC driver?
Why this time increase when I balance the load a
Jed Brown escribió:
On Mon 2008-09-29 20:30, Leonardo Fialho wrote:
1) If I use one node (8 cores) the "user" % is around 100% per core. The
execution time is around 430 seconds.
2) If I use 2 nodes (4 cores in each node) the "user" % is around 95%
per core and th
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478
12 matches
Mail list logo