We are seeing that mirrormaker consumer started looping through offset out
of range and reset offset errors for some of partitions (2 out of 8
partitions). The consumerOffsetChecker reported very high Lag for these 2
partitions. Looks like this problem has started after a consumer rebalance.
Here is log lines:

2013-10-06 06:09:59,993
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4]
WARN  (kafka.consumer.ConsumerFetcherThread)  -
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4],
current offset 2526006629 for partition [FunnelProto,1] out of range; reset
offset to 2526006629
2013-10-06 06:09:59,993
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4]
WARN  (kafka.consumer.ConsumerFetcherThread)  -
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4],
current offset 2363213504 for partition [FunnelProto,3] out of range; reset
offset to 2363213504
2013-10-06 06:09:59,993
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4]
WARN  (kafka.consumer.ConsumerFetcherThread)  -
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4],
current offset 2146256007 for partition [jmx,0] out of range; reset offset
to 2146256007
2013-10-06 06:09:59,992
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4]
WARN  (kafka.consumer.ConsumerFetcherThread)  -
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4],
current offset 2239688 for partition [tower_timing_metrics,3] out of range;
reset offset to 2239688
2013-10-06 06:09:59,889
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4]
WARN  (kafka.consumer.ConsumerFetcherThread)  -
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4],
current offset 1234239 for partition [agent,0] out of range; reset offset
to 1234239
2013-10-06 06:09:59,889
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4]
WARN  (kafka.consumer.ConsumerFetcherThread)  -
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4],
current offset 2526006629 for partition [FunnelProto,1] out of range; reset
offset to 2526006629
2013-10-06 06:09:59,889
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4]
WARN  (kafka.consumer.ConsumerFetcherThread)  -
[ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4],
current offset 2363213504 for partition [FunnelProto,3] out of range; reset
offset to 2363213504


Also, as you can it's resetting offset to same value so it's looping
through this offset resets again and again. After we restarted our
mirrormaker process, it started consuming from beginning topic for all
partitions (we started received messages 7 days ) and it caught in couple
of hours..

We have couple of questions

1) What might have caused this to end up in this bad state..?
2) We had offset out of range problem only for 2 out of 8 partitions, but
it started to consume from beginning for all partitions in topic after we
restarted mirrormaker.. How problem with 2 partitions affected all other
partitions ..?


-- 
Thanks,
Raja.

Reply via email to