Does keyby() on field generate the same number of key as the number of
uniqueness of the field?
For example, if the field is valued in range {"a", "b", "c"}, then the
number of keys is 3, correct?
The field in my case has half of million uniqueness (ip addresses), so
keyby() on field following with timeWindow() would generate half of
million partitions?

If I use key selector instead, e.g.

  .keyBy(new KeySelector<MyEvent, Long>() {
     public Long getKey(MyEvent ev) { return ev.hashCode() % 137L; }
   })

Then the number of partitions could be limited within 137, correct?


2017-12-28 17:33 GMT+08:00 Ufuk Celebi <u...@apache.org>:
> Hey Jinhua,
>
> On Thu, Dec 28, 2017 at 9:57 AM, Jinhua Luo <luajit...@gmail.com> wrote:
>> The keyby() upon the field would generate unique key as the field
>> value, so if the number of the uniqueness is huge, flink would have
>> trouble both on cpu and memory. Is it considered in the design of
>> flink?
>
> Yes, keyBy hash partitions the data across the nodes of your Flink
> application and thus you can easily scale your application up if you
> need more processing power.
>
> I'm not sure that this is the problem in your case though. Can you
> provide some more details what you are doing exactly? Are you
> aggregating by time (for the keyBy you mention no windowing, but then
> you mention windowAll)? What kind of aggregation are you doing? If
> possible, feel free to share some code.
>
>> Since windowsAll() could be set parallelism, so I try to use key
>> selector to use field hash but not value, that I hope it would
>> decrease the number of the keys, but the flink throws key out-of-range
>> exception. How to use key selector in correct way?
>
> Can you paste the exact Exception you use? I think this might indicate
> that you don't correctly extract the key from your record, e.g. you
> extract a different key on sender and receiver.
>
> I'm sure we can figure this out after you provide more context. :-)
>
> – Ufuk

Reply via email to