met the same problem before and resolved with Yi's help, xD
On 20 October 2017 at 06:10, Yi Pan wrote:
> Awesome that you have figured it out! Just a general notice: any logcompact
> topic used in Samza may see this slow-down if the Kafka log cleaner thread
> dies, which include checkpoint, coor
Awesome that you have figured it out! Just a general notice: any logcompact
topic used in Samza may see this slow-down if the Kafka log cleaner thread
dies, which include checkpoint, coordinator stream, and changelog topics.
Best!
-Yi
On Thu, Oct 19, 2017 at 12:14 PM, XiaoChuan Yu wrote:
> Hi,
Hi,
We were finally able to find out why the job takes so long to start.
There was higher than normal network IO during job startup and so we
checked size of the checkpoint topic on disk and it was ~21GB.
We then restarted the Kafka node who was the leader for the checkpoint
topic, the topic disk
>> How long does it take?
It took around 10 minute from "Got offset 0 for topic ...
" to init() being called on the Task.
>> Have you measured which parts of the start up sequence take the most
time?
>> - is it checkpoint restoration, or restore of local state?
Should be checkpoint restoration. T
Hi Xiaochuan,
>> What does that loop do exactly?
Most of what the run-loop does is documented in
https://samza.apache.org/learn/documentation/0.9/container/event-loop.html
>> We are running into a problem where it seems to take a very long time to
restart a Samza job.
Some follow-up questions,
Hi,
We are running into a problem where it seems to take a very long time to
restart a Samza job.
We are using Samza 0.9.1 at the moment.
>From the logs for a particular container it looks like it has something to
do with reading checkpoints from Kafka:
2017-09-20 03:21:02.060 INFO o.a.s.c.kafk