___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478
Jed Brown escribió:
On Mon 2008-09-29 20:30, Leonardo Fialho wrote:
1) If I use one node (8 cores) the "user" % is around 100% per core. The
execution time is around 430 seconds.
2) If I use 2 nodes (4 cores in each node) the "user" % is around 95%
per core and th
ons are:
A) The execution time in case "1" should be smaller (only sm
communication, no?) than case "2" and "3", no? Cache problems?
B) Why the "sys" time while using communication inter nodes? NIC driver?
Why this time increase when I balance the load a
1.3)
[lfialho@aoclsp ~]$
Thanks,
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478
simple-ping 20 1
-- Josh
On May 29, 2008, at 7:54 AM, Leonardo Fialho wrote:
Hi All,
I made some tests with a dummy "ping" application. Some memory
problems occurred. On these tests I obtained the following results:
1) OpenMPI (without FT):
- delaying 1 second to send token to
ze growing all the time.
I think that it is something in the CRCP module/component...
Thanks,
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-9
returns the "bad file descriptor" (EBAFD) error,
and the blcr module don´t catch this error, only return (-1) "child failed".
Thanks,
Leonardo Fialho
Josh Hursey escribió:
I don't think I have ever seen this one before. :(
So you are trying to checkpoint the MPI pro
file descriptor
Thanks,
Leonardo Fialho
Leonardo Fialho escribió:
Hi All,
Does anybody experiment this error?
[aogrdini:09070] Global) Receive a command message from [[13242,0],0].
...
[aogrd02:07642] Local) Receive a command message.
...
[aogrd01:07938] Local) Receive a command me
, like a child of the original process.
When a run an application with this version and take a checkpoint
manually, I have no problem...
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos
Phone
Sure! :) Thank you so much!
Leonardo
Aurélien Bouteiller escribió:
Hi,
to enable the vprotocol pessimist, you have to specify -mca vprotocol
pessimist. This parameter takes precedence on the priority. Let me
know if you hit success :]
Aurelien
Le 5 mars 08 à 13:55, Leonardo Fialho a
mca_base_component_distill_checkpoint_ready=0
ft_cr_enabled=1
crs=
rml_wrapper=ftrm
snapc=single (similar to full but do a checkpoint of only one process)
filem=rsh
pml_wrapper=crcpw
crcp=uncoord (similar to coord but need to do checkpoint of only one
process)
btl=tcp,self
Thanks,
Leonardo Fialho
--
Leonardo Fialho
Computer Architecture
Josh,
At this moment I´m working in the uncoordinated checkpoint, and
probably I´ll have some tools to collect data from the process and
environment and probably from the application.
About the application I´m considering the possibility to do something
like this (MPI_Checkpoint??).
Leonardo
12 matches
Mail list logo