Hi team! We have observed strange behavior when using KafkaSource and have checkpointing enabled.
Even if we do not get checkpoint failures or any means of errors, we see the committed offset going up and down in a crazy manner instead of moving up to reach the end offset of that X topic. We know that Flink does not rely on Kafka for committed offsets, etc, but it captures that important info in the checkpoints it makes. Flink updates Kafka as a means of reporting/monitoring tool only. So in our example, we hava kafkaSource, and we consume from a topic A where it has endOffset: 60_000. The committedOffset after some time reaches the 30_000, and then 40_000, etc. At some point, it reaches the 47_000 and then drops back down not in a fashioned pattern, eg, 47_000 -> 23_000 -> 27_000 -> 36_000 -> 22_000 and so on and so forth. Unfortunately, there is not any error or a checkpoint failure so that we could explain that behavior. Moreover, we would like to try unaligned checkpoint to opt-out if backpressure is doing that harm, but that is out of options at the moment. What do you suggest? Have you experienced that before? Thanks in advance.