Samza Job Slow to Restart

XiaoChuan Yu Wed, 20 Sep 2017 11:22:02 -0700

Hi,

We are running into a problem where it seems to take a very long time to
restart a Samza job.
We are using Samza 0.9.1 at the moment.


>From the logs for a particular container it looks like it has something to
do with reading checkpoints from Kafka:

2017-09-20 03:21:02.060 INFO  o.a.s.c.kafka.KafkaCheckpointManager [main] -
Got offset 0 for topic __samza_checkpoint_ver_1_for_test-job_1 and
partition 0. Attempting to fetch messages for checkpoint log.
2017-09-20 03:21:02.072 INFO  o.a.s.c.kafka.KafkaCheckpointManager [main] -
Get latest offset 42890599 for topic
__samza_checkpoint_ver_1_for_test-job_1 and partition 0.

Looking at this line in KafkaCheckpointManager
<https://github.com/apache/samza/blob/0.9.1/samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala#L275>,
it seems to indicate that the loop iterates from 0 to 42890599 and make
requests for each.

Questions:
1. What does that loop do exactly?
2. Is this an expected behaviour? Is "Got offset 0 for topic ..." normal?
3. Any ideas on how to fix this?

Thanks,
Xiaochuan Yu

Samza Job Slow to Restart

Reply via email to