[ 
https://issues.apache.org/jira/browse/FLINK-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907156#comment-15907156
 ] 

ASF GitHub Bot commented on FLINK-5823:
---------------------------------------

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/3522

    [FLINK-5823] [checkpoints] State Backends also handle Checkpoint Metadata 
and introduce Global Cleanup Hooks

    ## Core Changes
    
    ### Store Metadata in State Backend
    
    The state backend is now responsible for storing the checkpoint metadata. 
There is no implicit assumption that the checkpoint metadata is stored in a 
file systems any more.
    
      - All checkpoint directory / savepoint directory specific config settings 
are now part of the state backends. The Checkpoint Coordinator simply calls the 
relevant methods on the state backends to store metadata.
      
      - Similar as the `CheckpointStreamFactory` for storing checkpoint state, 
there is now a `CheckpointMetadataStreamFactory` for the metadata.
    
      - State backends are not required to be able to persist metadata - only 
ifor HA setups and when externalized checkpoints are requested. 
    
      - All checkpoints with persisted metadata are addressable via a 
"pointer", which is state-backend specific. For File-system based statebackends 
(all statebackends in Flink currently), this pointer is the file path.
      
    ### Global cleanup hooks
    
    State backends can implement an extended interface to create global cleanup 
hooks. For example for a file system, a global cleanup hook simply recursively 
deletes the checkpoint directory, which is for most file systems much faster 
than issuing a delete call per file.
    
    The `MemoryStateBackend` and `FsStateBackend` use fast cleanup hooks, the 
`RocksDBStateBackend` should get then in a followup (see below).
    
    ### Application-defined State Backends pick up additional values from the 
configuration
    
    We need to keep supporting the scenario of setting a state backends in the 
user program, but configuring parameters like checkpoint directory in the 
cluster config. To support that, state backends may implement an additional 
interface which lets them pick up configuration values from the cluster 
configuration.
    
    
    ## Altered user-facing Behavior
    
      - Externalized checkpoints store all files (state and metadata) strictly 
in the same directory now. (Savepoints were contained in a single directory 
already before)
      
      - Both savepoint commands and configuration parameters now require 
qualified URIs as well (i.e., `file:///path/do/savepoint`, whereas before the 
configs and command line also excepted `/path/do/savepoint`. 
      Because not having qualified URIs is error-prone anyways (auto fallback 
to local file system) I am actually in favor of doing this change.
    
    ## Tests
    
    This adds a lot of tests, which can due to the changed design be done 
completely on the state backends,
    without instantiating a CheckpointCoordinator.
    
      - Checkpoint / Savepoint delete with global hook works as expected
      - Interaction of old cleanup logic and optional global cleanup hook
      - State backends are properly loaded from configuration
      - Application-configured State backends properly pick up additional 
configuration values
    
    ## Followups
    
      - Implement the global hooks for the RocksDB state backend. Because the 
RocksDB state backend internally delegates to a "storage backend", this need to 
few extra tricks and was not done in this already large pull request.
      
      - The HA checkpoint store needs not store checkpoint metadata itself any 
more, it can simply store the pointer.
    
      - This abstraction allows state backends to over periodic cleanup hooks 
that can search for lost state (files that are not referenced any more), which 
can happen when TaskManager / JobManager processes fail during finalizing a 
checkpoint or when handing over state between JobManager / TaskManager.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink filestatebackend

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3522.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3522
    
----
commit 2bd817286f89693af4bb98504091afc1f45749ad
Author: Stephan Ewen <se...@apache.org>
Date:   2017-03-01T22:00:55Z

    [FLINK-5823] [checkpoints] State Backends also handle Checkpoint Metadata

----


> Store Checkpoint Root Metadata in StateBackend (not in HA custom store)
> -----------------------------------------------------------------------
>
>                 Key: FLINK-5823
>                 URL: https://issues.apache.org/jira/browse/FLINK-5823
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to