Re: keyby() issue

Ufuk Celebi Thu, 28 Dec 2017 01:34:38 -0800

Hey Jinhua,

On Thu, Dec 28, 2017 at 9:57 AM, Jinhua Luo <luajit...@gmail.com> wrote:
> The keyby() upon the field would generate unique key as the field
> value, so if the number of the uniqueness is huge, flink would have
> trouble both on cpu and memory. Is it considered in the design of
> flink?


Yes, keyBy hash partitions the data across the nodes of your Flink
application and thus you can easily scale your application up if you
need more processing power.

I'm not sure that this is the problem in your case though. Can you
provide some more details what you are doing exactly? Are you
aggregating by time (for the keyBy you mention no windowing, but then
you mention windowAll)? What kind of aggregation are you doing? If
possible, feel free to share some code.

> Since windowsAll() could be set parallelism, so I try to use key
> selector to use field hash but not value, that I hope it would
> decrease the number of the keys, but the flink throws key out-of-range
> exception. How to use key selector in correct way?

Can you paste the exact Exception you use? I think this might indicate
that you don't correctly extract the key from your record, e.g. you
extract a different key on sender and receiver.

I'm sure we can figure this out after you provide more context. :-)

– Ufuk

Re: keyby() issue

Reply via email to