Thanks Ken. Yes, similar approach is suggested in post I shared in my
question. But to me it feels a bit hack-ish.
I would like to know if this is only solution with Flink or do I miss
something?
Can there be more API-ish support for such use-case from Flink? Is there a
reason why there is none? Or is there?

On Wed, Jan 2, 2019 at 5:29 PM Ken Krugler <kkrugler_li...@transpac.com>
wrote:

> FWIW, if you want exactly one record per operator, then this code <
> https://github.com/ScaleUnlimited/flink-crawler/blob/ba06aa87226b4c44e30aba6df68984d53cc15519/src/main/java/com/scaleunlimited/flinkcrawler/utils/FlinkUtils.java#L153>
> should generate key values that will be partitioned properly.
>
> — Ken
>
> > On Jan 2, 2019, at 12:16 AM, Jozef Vilcek <jozo.vil...@gmail.com> wrote:
> >
> > Hello,
> >
> > I am facing a problem where KeyedStream is purely parallelised on workers
> > for case where number of keys is close to parallelism.
> >
> > Some workers process zero keys, some more than one. This is because of
> > `KeyGroupRangeAssignment.assignKeyToParallelOperator()` in
> > `KeyGroupStreamPartitioner` as I found out in this post:
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Use-keyBy-to-deterministically-hash-each-record-to-a-processor-task-slot-td16483.html
> >
> > I would like to find out what are my options here.
> > * is there a reason why custom partitioner can not be used in keyed
> stream?
> > * can there be an API support for creating keys correct KeyedStream
> > compatible keys? It would also make
> > `DataStreamUtils.reinterpretAsKeyedStream()` more useable for certain
> > scenarios.
> > * any other option I have?
> >
> > Many thanks in advance.
> >
> > Best,
> > Jozef
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>

Reply via email to