[GitHub] flink pull request #5582: [FLINK-8790][State] Improve performance for recove...

StefanRRichter Fri, 01 Jun 2018 02:04:21 -0700

Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5582#discussion_r192337341
  
    --- Diff: 
flink-state-backends/flink-statebackend-rocksdb/src/test/java/org/apache/flink/contrib/streaming/state/RocksDBStateBackendTest.java
 ---
    @@ -547,4 +549,30 @@ public boolean accept(File file, String s) {
                        return true;
                }
        }
    +
    +   private static class TestRocksDBStateBackend extends 
RocksDBStateBackend {
    +
    +           public TestRocksDBStateBackend(AbstractStateBackend 
checkpointStreamBackend, boolean enableIncrementalCheckpointing) {
    +                   super(checkpointStreamBackend, 
enableIncrementalCheckpointing);
    +           }
    +
    +           @Override
    +           public <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
    +                   Environment env,
    +                   JobID jobID,
    +                   String operatorIdentifier,
    +                   TypeSerializer<K> keySerializer,
    +                   int numberOfKeyGroups,
    +                   KeyGroupRange keyGroupRange,
    +                   TaskKvStateRegistry kvStateRegistry) throws IOException 
{
    +
    +                   AbstractKeyedStateBackend<K> keyedStateBackend = 
super.createKeyedStateBackend(
    +                           env, jobID, operatorIdentifier, keySerializer, 
numberOfKeyGroups, keyGroupRange, kvStateRegistry);
    +
    +                   // We ignore the range deletions on production, but 
when we are running the tests we shouldn't ignore it.
    --- End diff --
    
    As I see, this is only happening in the case where there is only one handle 
and we are only interested in a subset of the key-groups. Unfortunately, that 
should be the common case of scaling out. I am wondering if we should not 
prefer to apply normal deletes over range delete, because what will happen if 
we take again a snapshot from a database that was using range deletes? Will the 
keys all be gone in cases of full and incremental snapshots? If the performance 
of normal deletes is not terrible, that might be cleaner for as long as range 
deletes are not working properly or have potential negative side-effects. What 
is your opinion about this?

---

[GitHub] flink pull request #5582: [FLINK-8790][State] Improve performance for recove...

Reply via email to