Re: CoProcess() VS union.Process() & Timers in them

2018-02-20 Thread m@xi
OK man! Thanks a lot. To tell you the truth the documentation did not explain it in a convincing way to consider it an important/potential operator to use in my applications. Thanks for mentioning. Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: CoProcess() VS union.Process() & Timers in them

2018-02-20 Thread Fabian Hueske
I don't think that the mapping is that sophisticated. I'd assume it is a bit simpler and just keeps one local pipeline (the one with the same subtask index) which will run in the same slot (unless explicitly configured differently). TBH, I would not rely on this behavior. rescale() is rather an ar

Re: CoProcess() VS union.Process() & Timers in them

2018-02-20 Thread m@xi
Hey Fabian! Thanks for the comprehensive replies. Now I understand those concepts properly. Regarding .rescale() , it does not receive any arguments. Thus, I assume that the way it does the shuffling from operator A to operator B instances is a black box for the programmer and probably has to do

Re: CoProcess() VS union.Process() & Timers in them

2018-02-19 Thread Fabian Hueske
Changing the parallelism works in Flink by taking a savepoint, shutting down the job, and restarting it from the savepoint with another parallelism. The rescale() operator defines how records are exchanged between two operators with different parallelism. Rescale prefers local data exchange over u

Re: CoProcess() VS union.Process() & Timers in them

2018-02-13 Thread m@xi
Thanks a lot Fabian and Xingcan! @ Fabian : Nice answer. It enhanced my intuition on the topic. So, how one may change the parallelism while the Flink job is running, e.g. lower the parallelism during the weekend? Also, it is not clear to me how to use the rescale() operator. If you may provide

Re: CoProcess() VS union.Process() & Timers in them

2018-02-13 Thread Fabian Hueske
You might also want to change the parallelism if the rate of your input streams varies, e.g., you scale an application down over night or the weekend. 2018-02-13 13:43 GMT+01:00 Xingcan Cui : > Hi Max, > > Currently, the timers can only be used with keyed streams. As @Fabian > suggested, you can

Re: CoProcess() VS union.Process() & Timers in them

2018-02-13 Thread Xingcan Cui
Hi Max, Currently, the timers can only be used with keyed streams. As @Fabian suggested, you can “forge” a keyed stream with the special KeySelector, which maps all the records to the same key. IMO, Flink uses keyed streams/states as it’s a deterministic distribution mechanism. Here, “the para

Re: CoProcess() VS union.Process() & Timers in them

2018-02-13 Thread m@xi
OK Great! Thanks a lot for the super ultra fast answer Fabian! One intuitive follow-up question. So, keyed state is the most preferable one, as it is easy for the Flink System to perform the re-distribution in case of change in parallelism, if we have a scale-up or scale-down. Also, it is useful

Re: CoProcess() VS union.Process() & Timers in them

2018-02-13 Thread Fabian Hueske
Hi Max, you can use keyed state on an operator with parallelism 1 if you assign a default key with a KeySelector: stream.keyBy(new NullByteKeySelector) with NullByteKeySelector defined as public class NullByteKeySelector implements KeySelector { private static final long serialVersionUID =

Re: CoProcess() VS union.Process() & Timers in them

2018-02-13 Thread m@xi
Hello XingCan, Finally, I did it with union. Now inside the processElement() function of my CoProcessFunction I am setting a timer and periodically I want to print out some data through the onTimer() function. Below I attach the image stating the following: "Caused by: java.lang.UnsupportedOpera

Re: CoProcess() VS union.Process()

2018-02-09 Thread Xingcan Cui
Hi Max, if I understood correctly, instead of joining three streams, you actually performed two separate joins, say S1 JOIN S3 and S2 JOIN S3, right? Your plan "(S1 UNION S2) JOIN S3” seems to be identical with “(S1 JOIN S3) UNION (S2 JOIN S3)” and if that’s what you need, your pipeline should

CoProcess() VS union.Process()

2018-02-09 Thread m@xi
Hello Flinkers, I would like to discuss with you about something that bothers me. So, I have two streams that I want to join along with a third stream which I want to consult its data from time to time and triggers decisions. Essentially, this boils down to coProcessing 3 streams together instead