[ 
https://issues.apache.org/jira/browse/FLINK-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439372#comment-15439372
 ] 

ASF GitHub Bot commented on FLINK-3755:
---------------------------------------

Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2376
  
    Very good work and very nice code!
    
    Some comments after a joint review:
    
      - The most critical issue is that there should not be any blocking on 
async threads during task shutdown. This unnecessarily delays responses to 
canceling and redeployment.
    
      - At this point, the `KeyGroupAssigner` interface seems a bit useless, 
especially if it is not parametrized with variable key group mappings. For the 
sake of making this simpler and more efficient, one could just have a static 
method for that.
    
      - I would suggest to make the assumption that key groups are always used 
(they should be, even if their number is equal to the parallelism), and drop 
the checks for `numberOfKeyGroups > 0`, for example in the 
KeyGroupHashPartitioner.
    
      - A bit more difficult is what to assume as the default number of key 
groups. We thought about assuming a default of `128`. That has no overhead in 
state backends like RocksDB and also allows initial job deployments which did 
not think about properly configuring this to have some freedom to scale out. If 
the parallelism is >= 128, this should probably round to the next highest 
power-of-two.
    
      - There are some log statements which cause log flooding, like an INFO 
log statement for every checkpoint stream (factory) created.



> Introduce key groups for key-value state to support dynamic scaling
> -------------------------------------------------------------------
>
>                 Key: FLINK-3755
>                 URL: https://issues.apache.org/jira/browse/FLINK-3755
>             Project: Flink
>          Issue Type: New Feature
>    Affects Versions: 1.1.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> In order to support dynamic scaling, it is necessary to sub-partition the 
> key-value states of each operator. This sub-partitioning, which produces a 
> set of key groups, allows to easily scale in and out Flink jobs by simply 
> reassigning the different key groups to the new set of sub tasks. The idea of 
> key groups is described in this design document [1]. 
> [1] 
> https://docs.google.com/document/d/1G1OS1z3xEBOrYD4wSu-LuBCyPUWyFd9l3T9WyssQ63w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to