Hi Jason,
> In our case, our input/output ratio of these Flin operators are all 1 to
1, so I guess it doesn't matter that much..
Yes
> But I think the keys we are using in general are pretty uniform.
Cool. You could run for a period of time to see if there is data skew. If
there is indeed a data sk
Thanks for the help guys!
Yea we can potentially append random strings to the keys and duplicate data
across them to avoid skewness, if necessary. But I think the keys we are
using in general are pretty uniform.
The lowest selectivity at the up fornt method is really interesting though.
In our cas
Hi Jason,
A job with multiple reshuffle data could be scalable under normal
circumstances.
But we should carefully avoid data skew. Because if input stream has data
skew, add more resources would not help.
Besides that, if we could adjust the order of the functions, we could put
the keyed process f
Hi!
Key-by operations can scale with parallelisms. Flink will shuffle your
record to different sub-task according to the hash value of the key modulo
number of parallelism, so the more parallelism you have the faster Flink
can process data, unless there is a data skew.
Jason Liu 于2021年8月31日周二 上午