I have a job using the keyBy function. The job parallelism is 40. My key is
based on a field in the records that has 2000+ possible values
My question is for the records for a given key, will they all be sent to
the one subtask or be distributed evenly amongst the all 40 downstream
operator sub tas
Hi David,
keyBy() is implemented with hash partitioning. If you use the keyBy
function, the records for a given key will be shuffled to a downstream
operator subtask. See more in [1].
[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#keyby
Regards,
Hi, we recently went live with a job on a shared cluster, which is managed
with Yarn
The job was started using
flink run -s hdfs://PATH_TO_A_CHECKPOINT_FROM_A_PREVIOUS_RUN_HERE
Everything worked fine for a few days, but then the job needed to be
restored for whatever reason
2023-08-03 16:34:44
There's already a built-in concept of WindowStagger that provides an
interface for accomplishing this.
It's not as well integrated (or documented) as it might be, but the
basic mechanism exists. To use it, I believe you would do something
like this:
assigner = new TumblingEventTimeWindows(Time.se
Hello,
I hope you have the context of use case clear now as described in below mail as
well as here again –
Flink application will be opening a server socket as part of custom source over
a fixed port and client will be opening client sockets for Flink server socket.
To read data from each cli