OK man! Thanks a lot.
To tell you the truth the documentation did not explain it in a convincing
way to consider it an important/potential operator to use in my
applications.
Thanks for mentioning.
Best,
Max
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
I don't think that the mapping is that sophisticated.
I'd assume it is a bit simpler and just keeps one local pipeline (the one
with the same subtask index) which will run in the same slot (unless
explicitly configured differently).
TBH, I would not rely on this behavior. rescale() is rather an ar
Hey Fabian!
Thanks for the comprehensive replies. Now I understand those concepts
properly.
Regarding .rescale() , it does not receive any arguments. Thus, I assume
that the way it does the shuffling from operator A to operator B instances
is a black box for the programmer and probably has to do
Changing the parallelism works in Flink by taking a savepoint, shutting
down the job, and restarting it from the savepoint with another parallelism.
The rescale() operator defines how records are exchanged between two
operators with different parallelism.
Rescale prefers local data exchange over u
Thanks a lot Fabian and Xingcan!
@ Fabian : Nice answer. It enhanced my intuition on the topic. So, how one
may change the parallelism while the Flink job is running, e.g. lower the
parallelism during the weekend?
Also, it is not clear to me how to use the rescale() operator. If you may
provide
You might also want to change the parallelism if the rate of your input
streams varies, e.g., you scale an application down over night or the
weekend.
2018-02-13 13:43 GMT+01:00 Xingcan Cui :
> Hi Max,
>
> Currently, the timers can only be used with keyed streams. As @Fabian
> suggested, you can
Hi Max,
Currently, the timers can only be used with keyed streams. As @Fabian
suggested, you can “forge” a keyed stream with the special KeySelector, which
maps all the records to the same key.
IMO, Flink uses keyed streams/states as it’s a deterministic distribution
mechanism. Here, “the para
OK Great!
Thanks a lot for the super ultra fast answer Fabian!
One intuitive follow-up question.
So, keyed state is the most preferable one, as it is easy for the Flink
System to perform the re-distribution in case of change in parallelism, if
we have a scale-up or scale-down. Also, it is useful
Hi Max,
you can use keyed state on an operator with parallelism 1 if you assign a
default key with a KeySelector:
stream.keyBy(new NullByteKeySelector)
with NullByteKeySelector defined as
public class NullByteKeySelector implements KeySelector {
private static final long serialVersionUID =
Hello XingCan,
Finally, I did it with union.
Now inside the processElement() function of my CoProcessFunction I am
setting a timer and periodically I want to print out some data through the
onTimer() function.
Below I attach the image stating the following: "Caused by:
java.lang.UnsupportedOpera
Hi Max,
if I understood correctly, instead of joining three streams, you actually
performed two separate joins, say S1 JOIN S3 and S2 JOIN S3, right? Your plan
"(S1 UNION S2) JOIN S3” seems to be identical with “(S1 JOIN S3) UNION (S2
JOIN S3)” and if that’s what you need, your pipeline should
Hello Flinkers,
I would like to discuss with you about something that bothers me. So, I have
two streams that I want to join along with a third stream which I want to
consult its data from time to time and triggers decisions.
Essentially, this boils down to coProcessing 3 streams together instead
12 matches
Mail list logo