Hey Jinhua, On Thu, Dec 28, 2017 at 9:57 AM, Jinhua Luo <luajit...@gmail.com> wrote: > The keyby() upon the field would generate unique key as the field > value, so if the number of the uniqueness is huge, flink would have > trouble both on cpu and memory. Is it considered in the design of > flink?
Yes, keyBy hash partitions the data across the nodes of your Flink application and thus you can easily scale your application up if you need more processing power. I'm not sure that this is the problem in your case though. Can you provide some more details what you are doing exactly? Are you aggregating by time (for the keyBy you mention no windowing, but then you mention windowAll)? What kind of aggregation are you doing? If possible, feel free to share some code. > Since windowsAll() could be set parallelism, so I try to use key > selector to use field hash but not value, that I hope it would > decrease the number of the keys, but the flink throws key out-of-range > exception. How to use key selector in correct way? Can you paste the exact Exception you use? I think this might indicate that you don't correctly extract the key from your record, e.g. you extract a different key on sender and receiver. I'm sure we can figure this out after you provide more context. :-) – Ufuk