Hi,
to enable the vprotocol pessimist, you have to specify -mca vprotocol
pessimist. This parameter takes precedence on the priority. Let me
know if you hit success :]
Aurelien
Le 5 mars 08 à 13:55, Leonardo Fialho a écrit :
Hi All,
I´m trying to use the pml_v (pessimist) with FT components, but during
the loading the pml_v closes and close the vprotocol_pessimist too...
according the following:
(log of only one process...)
$ mpirun -np 2 -hostfile ../hostfile -am ../ft-enable-cr -v -d ./
ping 10 1
opal_cr: init: Verbose Level: 128
opal_cr: init: FT Enabled: 1
opal_cr: init: OPAL CR Allow OPAL Only: 0
opal_cr: init: Is a tool program: 0
opal_cr: init: Checkpoint Signal: 10
opal_cr: init: Temp Directory: /tmp
proc_info: hnp_uri
1251737600.0;tcp://172.20.5.128:46169;tcp://
158.109.65.178:46169;tcp://10.8.0.1:46169
daemon uri 1251737600.1;tcp://172.20.5.1:39991
App) Named Pipes (/tmp/opal_cr_prog_read.17352)
(/tmp/opal_cr_prog_write.17352)
orte_cr: init: orte_cr_init()
mca: base: components_open: Looking for pml components
mca: base: components_open: opening pml components
mca: base: components_open: found loaded component cm
mca: base: components_open: component cm open function successful
mca: base: components_open: found loaded component crcpw
pml:crcpw: open()
pml:crcpw: open: priority = -128
pml:crcpw: open: verbosity = 128
mca: base: components_open: component crcpw open function successful
mca: base: components_open: found loaded component dr
mca: base: components_open: component dr open function successful
mca: base: components_open: found loaded component ob1
mca: base: components_open: component ob1 open function successful
mca: base: components_open: found loaded component v
pml_v: loaded
pml_v: vprotocol_pessimist: component_open: read priority 120
mca: base: components_open: component v open function successful
select: initializing pml component cm
select: init returned failure for component cm
select: initializing pml component crcpw
pml:crcpw: component_init: Priority -128
select: init returned priority -128
pml:select: Wrapper Component: Component crcpw was determined to be a
Wrapper PML with priority -128
select: component dr not in the include list
select: initializing pml component ob1
select: init returned priority 20
select: component v not in the include list
selected ob1 best priority 20
select: component ob1 selected
mca: base: close: component cm closed
mca: base: close: unloading component cm
mca: base: close: component dr closed
mca: base: close: unloading component dr
pml_v: parasite_close: Ok, I accept to die and let ob1 component
finish
pml_v: vprotocol_pessimist: component_close
pml_v: mca: base: close: component pessimist closed
pml_v: mca: base: close: unloading component pessimist
mca: base: close: component v closed
mca: base: close: unloading component v
pml:select: Wrapping: Component ob1 [20] is being wrapped by component
crcpw [-128]
pml:crcpw: component_init: Wrap the selected component ob1
pml:crcpw: component_init: Initalize Wrapper
ompi_cr: init: ompi_cr_init()
ompi_cr: finalize: ompi_cr_finalize()
pml:crcpw: component_finalize: Finalize
mca: base: close: component ob1 closed
mca: base: close: unloading component ob1
orte_cr: finalize: orte_cr_finalize()
The MCA parameters are (except the verbose parameters):
vprotocol_pessimist_priority=120 (very, very big...?)
snapc_base_global_snapshot_dir=/tmp/checkpoints
snapc_base_store_in_place=0
opal_cr_allow_opal_only=0
mca_base_component_distill_checkpoint_ready=0
ft_cr_enabled=1
crs=
rml_wrapper=ftrm
snapc=single (similar to full but do a checkpoint of only one process)
filem=rsh
pml_wrapper=crcpw
crcp=uncoord (similar to coord but need to do checkpoint of only one
process)
btl=tcp,self
Thanks,
Leonardo Fialho
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321