Strangely, I was not able to get checkpoint value for one particular
partition. Could this cause the job to be stuck?

On Thu, Mar 17, 2016 at 5:23 PM, David Yu <david...@optimizely.com> wrote:

> Hi, I wanna resurface this thread because I'm still facing issues with our
> samza not receiving events.
>
> Our samza job metric "SamzaContainerMetrics.process-calls" dropped to zero
> today again. So does "SamzaContainerMetrics.process-envelopes" (of course).
> Current topic offset and task checkpoint revealed that everything looks
> good:
>
> Topic partition 18 offset (as of now) = *488986*
> Current checkpoint for taskname Partition 18: tasknames.Partition
> 18.systems.kafka.streams.nogoalids.partitions.18 = *474222*
>
> Even after redeployment of the job, everything still seemed stuck :(
>
> Any ideas that could help me debug this will be appreciated.
>
>
> On Wed, Mar 16, 2016 at 4:19 PM, David Yu <david...@optimizely.com> wrote:
>
>> No, instead, I updated the checkpoint topic with the "upcoming" offsets.
>> (I should have done a check before that though).
>>
>> So a related question: if I delete the checkpoint topic from Kafka, that
>> would essentially clear up all the offset info and samza will be able to
>> recreate this topic with the latest offsets (e.g. smallest). Is that
>> correct? Just wanna find an easy way to do a "reprocess all" kind of
>> operation.
>>
>> Thanks.
>>
>> On Wed, Mar 16, 2016 at 3:25 PM, Navina Ramesh <
>> nram...@linkedin.com.invalid> wrote:
>>
>>> Strange. I am unable to comment on the behavior because I don't know what
>>> your checkpoints looked like in the checkpoint topic.
>>>
>>> Did you try reading the checkpoint topic log ?
>>>
>>> If you setting systems.kafka.streams.nogoalids.samza.reset.offset = true,
>>> you are essentially ignoring checkpoints for that stream. Do verify that
>>> you are reading from the correct offset in the stream :)
>>>
>>> Thanks!
>>> Navina
>>>
>>> On Wed, Mar 16, 2016 at 3:16 PM, David Yu <david...@optimizely.com>
>>> wrote:
>>>
>>> > Finally seeing events flowing again.
>>> >
>>> > Yes, the "systems.kafka.consumer.auto.offset.reset" option is probably
>>> not
>>> > a factor here. And yes, I am using checkpointing (kafka). Not sure if
>>> the
>>> > offsets are messed up. But I was able to use
>>> > "systems.kafka.streams.nogoalids.samza.reset.offset=true" to reset the
>>> > offsets to the newest ones. After that, events started coming. Still,
>>> it is
>>> > unclear to me how things got stuck in the first place.
>>> >
>>> > On Wed, Mar 16, 2016 at 2:31 PM, Navina Ramesh
>>> > <nram...@linkedin.com.invalid
>>> > > wrote:
>>> >
>>> > > HI David,
>>> > > This configuration you have tweaked
>>> > > (systems.kafka.consumer.auto.offset.reset) is honored only when one
>>> of
>>> > the
>>> > > following condition holds:
>>> > > * topic doesn't exist
>>> > > * checkpoint is older than the maximum message history retained by
>>> the
>>> > > brokers
>>> > >
>>> > > So, my questions are :
>>> > > Are you using checkpointing? If you do, you can read the checkpoint
>>> topic
>>> > > to see the offset that is being used to fetch data.
>>> > >
>>> > > If you are not using checkpoints, then samza uses
>>> > > systems.kafka.samza.offset.default to decide whether to start reading
>>> > from
>>> > > the earliest (oldest data) or upcoming (newest data) offset in the
>>> > stream.
>>> > >
>>> > > This could explain from where your job is trying to consume and you
>>> can
>>> > > cross-check with the broker.
>>> > > For the purpose of debugging, you can print a debug line in process()
>>> > > method to print the offset of the message you are processing
>>> > > (message.getOffset). Please remember to remove the debug line after
>>> > > troubleshooting. Else you risk filling up your logs.
>>> > >
>>> > > Let me know if you have more questions.
>>> > >
>>> > > Thanks!
>>> > > Navina
>>> > >
>>> > > On Wed, Mar 16, 2016 at 2:12 PM, David Yu <david...@optimizely.com>
>>> > wrote:
>>> > >
>>> > > > I'm trying to debug our samza job, which seem to be stuck from
>>> > consuming
>>> > > > from our Kafka stream.
>>> > > >
>>> > > > Every time I redeploy the job, only the same handful of events get
>>> > > > consumed, and then no more events get processed. I manually
>>> checked to
>>> > > make
>>> > > > sure the input stream is live and flowing. I also tried both the
>>> > > following:
>>> > > >
>>> > > > systems.kafka.consumer.auto.offset.reset=largest
>>> > > > systems.kafka.consumer.auto.offset.reset=smallest
>>> > > >
>>> > > > I'm also seeing the following from the log:
>>> > > >
>>> > > > ... partitionMetadata={Partition
>>> > > > [partition=0]=SystemStreamPartitionMetadata [oldestOffset=144907,
>>> > > > newestOffset=202708, upcomingOffset=202709], Partition
>>> > > > [partition=5]=SystemStreamPartitionMetadata [oldestOffset=140618,
>>> > > > newestOffset=200521, upcomingOffset=200522], ...
>>> > > >
>>> > > >
>>> > > > Not sure what other ways I could diagnose this problem. Any
>>> suggestion
>>> > is
>>> > > > appreciated.
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Navina R.
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Navina R.
>>>
>>
>>
>

Reply via email to