Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1069#issuecomment-136290092
  
    @anisnasir, good to know that most real world data sets can be handled by 
just splitting keys into two components. But what about the rest? Wouldn't it 
be nice to have a partitioner which works for all? How hard would it be to 
generalize your approach? We could set the default number of distributing 
channels to 2 to mimic your initial implementation.
    
    Concerning the test, you could for example create a `DataStream` which only 
contains a single key. Then you group on this key and then apply some other 
operation where you use the `PartialPartitioner`. In this latter operation you 
can assign the sub index of the task which processes the elements. Having this 
index, you should be able to calculate the distribution of the data. If you 
execute this test on 2 TMs with a single slot or a single TM with 2 slots, then 
you should get a 50/50 distribution if I'm not mistaken.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to