gaoyunhaii commented on pull request #16655:
URL: https://github.com/apache/flink/pull/16655#issuecomment-896129057


   Hi @StephanEwen , very thanks for the review and suggestions! I updated the 
PR:
   1. I encapsulate the logic regarding finished state into `CheckpointPlan` 
now based on the above discussion. Currently we might not be able to only pass 
the `FinishStateProvider` to the `PendingCheckpointFinishedTaskStateProvider` 
directly since it also relies on `CheckpointPlan` to acquire the tasks to 
trigger / commit / ack from `CheckpointPlan`. So I also extract 
`CheckpointPlan` to be an interface extends 
`PendingCheckpointFinishedTaskStateProvider`, and pass `CheckpointPlan` into 
the PendingCheckpoint for decouple and easy tests. Another option is to pass 
separate trigger / commit / ack lists together with the 
`PendingCheckpointFinishedTaskStateProvider` to the `PendingCheckpoint` 
separately, but it seems a bit verbose to me?
   2. Currently the `CheckpointPlan` still relies on `ExecutionVertex` and 
`ExecutionJobVertex` since the implementation relies on some information like 
operator lists and parallelism. I also agree with that have a dedicated 
interfaces like `CheckpointableVertex` and `CheckpointableJobVertex` to only 
expose the required information, but it seems not trivial and we might do this 
in dedicate issue?
   3. I refactored the `MetadMetadataV3Serializer` and add a flag to only do 
the illegal modification check for the startup restoring.
   4. I tested the performance for the validation with the 15 JobVertex, each 
vertex has 10 operators and parallelism 1000 
(https://github.com/gaoyunhaii/flink/commit/d90076599acf3d2895950ff76914d900f196951b),
 the running time is roughly 5~7 ms on my laptop (The average of running 20 
times). Some other sets of parameters with also 15000 tasks and 150 operators 
seems to be faster than this case. Thus I think the performance might be 
acceptable ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to