FWIW, if you want exactly one record per operator, then this code <https://github.com/ScaleUnlimited/flink-crawler/blob/ba06aa87226b4c44e30aba6df68984d53cc15519/src/main/java/com/scaleunlimited/flinkcrawler/utils/FlinkUtils.java#L153> should generate key values that will be partitioned properly.
— Ken > On Jan 2, 2019, at 12:16 AM, Jozef Vilcek <jozo.vil...@gmail.com> wrote: > > Hello, > > I am facing a problem where KeyedStream is purely parallelised on workers > for case where number of keys is close to parallelism. > > Some workers process zero keys, some more than one. This is because of > `KeyGroupRangeAssignment.assignKeyToParallelOperator()` in > `KeyGroupStreamPartitioner` as I found out in this post: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Use-keyBy-to-deterministically-hash-each-record-to-a-processor-task-slot-td16483.html > > I would like to find out what are my options here. > * is there a reason why custom partitioner can not be used in keyed stream? > * can there be an API support for creating keys correct KeyedStream > compatible keys? It would also make > `DataStreamUtils.reinterpretAsKeyedStream()` more useable for certain > scenarios. > * any other option I have? > > Many thanks in advance. > > Best, > Jozef -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra