Github user wenlong88 commented on the issue:

    https://github.com/apache/flink/pull/2345
  
    @StephanEwen you are right. But in specific situation, we may need some 
temporary compromise do make the system work well, and then remove the  
compromised points latter as soon as possible .
    I think both approaches have shortcomes.  When the state is large such as 
millions of KVs per db, full async approach can do the full async backup, but 
will cost a lot of time to restore which may be intolerable while doing 
fail-over in production. So I think it is necessary to have both, and the full 
async can be the default option. 
    
    Considering that there is no really perfect solution yet, I think It is OK 
to remove the semi-async way right now to avoid blocking the job of key group 
but need to reintroduce a better solution latter soon if you agree that rocksdb 
is quite a good choice of statebackend in large state situations. 
    
    Regrading to the overhead of memory in different dbs. Rocksdb can share the 
same block cache for different db instance but I don't know how to reduce the 
cost of memtables which is also a problem existed in current solution that 
allowing to store different stats in a single db using column families since 
memtables of column families are also separated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to