One more: how long did you stop the job before re-starting it after
upgrade? Is your checkpoint topic configured to be logcompact topic or
time-retention topic?

-Yi

On Wed, Mar 23, 2016 at 3:54 PM, Yi Pan <nickpa...@gmail.com> wrote:

> Hi, Yuanchi,
>
> Your configuration looks good to me. Can you share the container logs from
> 0.9 container and 0.10 container?
>
> Also, have you tried to run checkpoint-tool.sh to read from the checkpoint
> topic to see what's the content in the topic?
>
> Thanks!
>
> -Yi
>
> On Tue, Mar 22, 2016 at 1:48 PM, Yuanchi Ning <ningyuanchi...@gmail.com>
> wrote:
>
>> Hi Yi,
>>
>> Thanks for the help. Below are the checkpoint related configs:
>>
>> ##################### Job config #####################
>>
>> job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
>>
>> job.name=trip-counter
>>
>> job.datacenter=sjc1
>>
>> job.environment=sandbox
>>
>> #job.coordinator.system=kafka #comment out in 0.9, uncomment in 0.10
>>
>> #job.coordinator.replication.factor=3 #comment out in 0.9, uncomment in
>> 0.10
>>
>>
>> ##################### Task config #####################
>>
>> task.class=com.uber.athena.TripCounterTask
>>
>> task.inputs=kafka.trip_details,kafka.hp-api-client_signups
>>
>> task.outputTripTopic=trip_count_details
>>
>> task.outputClientSignUpsTopic=client_sign_ups_count_details
>>
>> task.checkpoint.factory=
>> org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
>>
>> task.checkpoint.system=kafka
>>
>> task.checkpoint.replication.factor=3
>>
>>
>>
>> On Tue, Mar 22, 2016 at 1:33 PM, Yi Pan <nickpa...@gmail.com> wrote:
>>
>> > Hi, Yuanchi,
>> >
>> > Did you check your configuration of task.checkpoint.system? What are the
>> > config value you used in 0.9 and what's the current configuration in
>> 0.10?
>> > If you can share your config before and after the upgrade, + the
>> container
>> > log from 0.10, we can be more helpful.
>> >
>> > Thanks!
>> >
>> > -Yi
>> >
>> > On Tue, Mar 22, 2016 at 1:19 PM, Yuanchi Ning <ningyuanchi...@gmail.com
>> >
>> > wrote:
>> >
>> > > Hi All,
>> > >
>> > > When we test upgrading our existing Samza job from 0.9 to 0.10, we saw
>> > our
>> > > Kafka Lag metric (KafkaSystemConsumerMetrics
>> > > "messages-behind-high-watermark
>> > > ") kept zero.
>> > > Since we stopped the old job for a while and then restart the job with
>> > 0.10
>> > > using the same name, the lag should at least spike at the beginning.
>> In
>> > the
>> > > application master we did see it's picking up the same checkpoint
>> topic
>> > > though.
>> > > Any ideas? thanks!
>> > >
>> > > Yuanchi
>> > >
>> > >
>> > > --
>> > > Yuanchi Ning
>> > >
>> >
>>
>>
>>
>> --
>> Yuanchi Ning
>>
>> Master of Information Technology
>> Very Large Information System
>> School of Computer Science
>> Carnegie Mellon University
>>
>> Mobile: (412)680-9774
>> Email: ningyuanchi...@gmail.com
>>            yuanc...@cs.cmu.edu
>>            yuanc...@andrew.cmu.edu
>>
>
>

Reply via email to