Will all records grouped using keyBy be allocated to a single subtask?

2023-08-03 Thread David Corley
I have a job using the keyBy function. The job parallelism is 40. My key is based on a field in the records that has 2000+ possible values My question is for the records for a given key, will they all be sent to the one subtask or be distributed evenly amongst the all 40 downstream operator sub tas

Re: Will all records grouped using keyBy be allocated to a single subtask?

2023-08-03 Thread xiangyu feng
Hi David, keyBy() is implemented with hash partitioning. If you use the keyBy function, the records for a given key will be shuffled to a downstream operator subtask. See more in [1]. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#keyby Regards,

Flink restored from an initially-specified checkpoint

2023-08-03 Thread Filip Karnicki
Hi, we recently went live with a job on a shared cluster, which is managed with Yarn The job was started using flink run -s hdfs://PATH_TO_A_CHECKPOINT_FROM_A_PREVIOUS_RUN_HERE Everything worked fine for a few days, but then the job needed to be restored for whatever reason 2023-08-03 16:34:44

Re: Investigating use of Custom Trigger to smooth CPU usage

2023-08-03 Thread David Anderson
There's already a built-in concept of WindowStagger that provides an interface for accomplishing this. It's not as well integrated (or documented) as it might be, but the basic mechanism exists. To use it, I believe you would do something like this: assigner = new TumblingEventTimeWindows(Time.se

RE: Flink operator task opens threads internally

2023-08-03 Thread Kamal Mittal via user
Hello, I hope you have the context of use case clear now as described in below mail as well as here again – Flink application will be opening a server socket as part of custom source over a fixed port and client will be opening client sockets for Flink server socket. To read data from each cli