We are seeing that mirrormaker consumer started looping through offset out of range and reset offset errors for some of partitions (2 out of 8 partitions). The consumerOffsetChecker reported very high Lag for these 2 partitions. Looks like this problem has started after a consumer rebalance. Here is log lines:
2013-10-06 06:09:59,993 [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4] WARN (kafka.consumer.ConsumerFetcherThread) - [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4], current offset 2526006629 for partition [FunnelProto,1] out of range; reset offset to 2526006629 2013-10-06 06:09:59,993 [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4] WARN (kafka.consumer.ConsumerFetcherThread) - [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4], current offset 2363213504 for partition [FunnelProto,3] out of range; reset offset to 2363213504 2013-10-06 06:09:59,993 [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4] WARN (kafka.consumer.ConsumerFetcherThread) - [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4], current offset 2146256007 for partition [jmx,0] out of range; reset offset to 2146256007 2013-10-06 06:09:59,992 [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4] WARN (kafka.consumer.ConsumerFetcherThread) - [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4], current offset 2239688 for partition [tower_timing_metrics,3] out of range; reset offset to 2239688 2013-10-06 06:09:59,889 [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4] WARN (kafka.consumer.ConsumerFetcherThread) - [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4], current offset 1234239 for partition [agent,0] out of range; reset offset to 1234239 2013-10-06 06:09:59,889 [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4] WARN (kafka.consumer.ConsumerFetcherThread) - [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4], current offset 2526006629 for partition [FunnelProto,1] out of range; reset offset to 2526006629 2013-10-06 06:09:59,889 [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4] WARN (kafka.consumer.ConsumerFetcherThread) - [ConsumerFetcherThread-mirrormakerProd_ops-mmrs1-1-sjl.ops.sfdc.net-1380036300408-baa80a5a-0-4], current offset 2363213504 for partition [FunnelProto,3] out of range; reset offset to 2363213504 Also, as you can it's resetting offset to same value so it's looping through this offset resets again and again. After we restarted our mirrormaker process, it started consuming from beginning topic for all partitions (we started received messages 7 days ) and it caught in couple of hours.. We have couple of questions 1) What might have caused this to end up in this bad state..? 2) We had offset out of range problem only for 2 out of 8 partitions, but it started to consume from beginning for all partitions in topic after we restarted mirrormaker.. How problem with 2 partitions affected all other partitions ..? -- Thanks, Raja.