Re: CoProcess() VS union.Process() & Timers in them

Xingcan Cui Tue, 13 Feb 2018 04:44:37 -0800

Hi Max,

Currently, the timers can only be used with keyed streams. As @Fabian 
suggested, you can “forge” a keyed stream with the special KeySelector, which 
maps all the records to the same key.


IMO, Flink uses keyed streams/states as it’s a deterministic distribution 
mechanism. Here, “the parallelism changes” may also refer to a parallelism 
change after the job restarts (e.g., when a node crashes). Flink can make sure 
that all the processing tasks and states will be safely re-distributed across 
the new cluster.

Hope that helps.

Best,
Xingcan

> On 13 Feb 2018, at 5:18 PM, m@xi <makisnt...@gmail.com> wrote:
> 
> OK Great!
> 
> Thanks a lot for the super ultra fast answer Fabian!
> 
> One intuitive follow-up question.
> 
> So, keyed state is the most preferable one, as it is easy for the Flink
> System to perform the re-distribution in case of change in parallelism, if
> we have a scale-up or scale-down. Also, it is useful to use hash partition a
> stream to different nodes/processors/PU (Processing Units) in general, by
> Keyed State.
> 
> Any other reasons for making Keyed State a must?
> 
> Last but not least, can you elaborate further on the "when the parallelism
> changes" part. I have read this in many topics in this forum, but I cannot
> understand its essence. For example, I define the parallelism of each
> operator in my Flink Job program based on the number of available PU. Maybe
> the essence lies in the fast that the number of PU might change from time to
> time, e.g. add more servers to the cluster where Flink runs and without
> stopping the Flink Job that runs you may perform the rescaling.
> 
> Thanks in advance.
> 
> Best,
> Max
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: CoProcess() VS union.Process() & Timers in them

Reply via email to