Re: Flink Checkpoint Duration/Job Throughput

2021-12-22 Thread Caizhi Weng
Hi! I see that there is no keyBy in your user code. Is it the case that some Kafka partitions contain a lot more data than others? If so, you can try datastream.rebalance() [1] to rebalance the data between each parallelism and reduce the impact of data skew. [1] https://nightlies.apache.org/flin

Re: Flink Checkpoint Duration/Job Throughput

2021-12-22 Thread Terry Heathcote
Hi Caizhi Thank you for the response. Below is relevant code for the pipeline as requested, along with the Kafka properties we set for both the FlinkKafkaProducer and Consumer. The operator that suffers the data skew are the sinks. import Models.BlueprintCacheDataType; import PipelineBlocks.*; im

Re: Flink Checkpoint Duration/Job Throughput

2021-12-21 Thread Caizhi Weng
Hi! >From your description this is due to data skew. The common solution to data skew is to add a random value to your partition keys so that data can be distributed evenly into downstream operators. Could you provide more information about your job (preferably user code or SQL code), especially

Flink Checkpoint Duration/Job Throughput

2021-12-21 Thread Terry Heathcote
Hi We are having trouble with record throughput that we believe to be a result of slow checkpoint durations. The job uses Kafka as both a source and sink as well as a Redis-backed service within the same cluster, used to enrich the data in a transformation, before writing records back to Kafka. Be