Hi, We are running into a problem where it seems to take a very long time to restart a Samza job. We are using Samza 0.9.1 at the moment.
>From the logs for a particular container it looks like it has something to do with reading checkpoints from Kafka: 2017-09-20 03:21:02.060 INFO o.a.s.c.kafka.KafkaCheckpointManager [main] - Got offset 0 for topic __samza_checkpoint_ver_1_for_test-job_1 and partition 0. Attempting to fetch messages for checkpoint log. 2017-09-20 03:21:02.072 INFO o.a.s.c.kafka.KafkaCheckpointManager [main] - Get latest offset 42890599 for topic __samza_checkpoint_ver_1_for_test-job_1 and partition 0. Looking at this line in KafkaCheckpointManager <https://github.com/apache/samza/blob/0.9.1/samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala#L275>, it seems to indicate that the loop iterates from 0 to 42890599 and make requests for each. Questions: 1. What does that loop do exactly? 2. Is this an expected behaviour? Is "Got offset 0 for topic ..." normal? 3. Any ideas on how to fix this? Thanks, Xiaochuan Yu