Not really.. there are dozens (over 50) producer instances using librdkafka, it is not feasible to update them all. I set log.message.timestamp.type to LogAppendTime in server.properties and it seems to be working (at least --reset-offsets returns the same offset for the same datetime for partitions where the leader is a broker with the new setting applied), but I won't be able to truly validate before I upgrade the entire cluster to 0.11.0.1 and apply this setting to all brokers.
Is there a reason LogAppendTime is not the default setting for this? On 26 October 2017 at 11:42, Ted Yu <yuzhih...@gmail.com> wrote: > Can you update librdkafka and try again ? > Thanks > -------- Original message --------From: Dan Markhasin <minimi...@gmail.com> > Date: 10/25/17 11:26 PM (GMT-08:00) To: users@kafka.apache.org Subject: > Re: Failure to reset consumer offsets for specific topics > For beats (the topic where timestamps are OK) the producers are Metricbeat > / Logstash which I assume use a relatively recent producer version. > For the topics with missing timestamps the producers are based on > librdkafka though I'm not sure which version exactly - I wouldn't be > surprised if it's pretty old and doesn't set timestamp on the messages it > produces. > > On 26 October 2017 at 08:17, Elyahou Ittah <elyaho...@fiverr.com> wrote: > > > Which driver is used yo produce these messages ? > > > > On Thu, Oct 26, 2017 at 8:14 AM, Dan Markhasin <minimi...@gmail.com> > > wrote: > > > > > Furthermore, when looking at messages produced to the data1_log topic > > with > > > print.timestamp=true they all have CreateTime:-1 whereas messages > > produced > > > to the beats topic have valid timestamps. The producers that are > sending > > to > > > data1_log are older than the producers that are sending to beats - if > > this > > > is where the broker takes the timestamps from, it explains why they are > > all > > > empty for data1_log. > > > > > > On 26 October 2017 at 08:07, Dan Markhasin <minimi...@gmail.com> > wrote: > > > > > > > After a bit more checking it seems that Kafka isn't writing > timestamps > > at > > > > all in the .timeindex file for the topics where offset rewind is not > > > > working. > > > > > > > > The following output is from * a different 0.11.0.0 * which also has > a > > > > topic called data1_log (this cluster has not experienced any issues > > > lately): > > > > > > > > bash-4.2$ ls -ltr | tail > > > > -rw-rw-r-- 1 ibiuser users 1073739867 Oct 24 20:42 > > > 00000000000723010092.log > > > > -rw-rw-r-- 1 ibiuser users 926216 Oct 24 20:42 > > > > 00000000000723010092.index > > > > -rw-rw-r-- 1 ibiuser users 16 Oct 25 05:09 > > > leader-epoch-checkpoint > > > > -rw-rw-r-- 1 ibiuser users 1073741239 Oct 25 09:52 > > > 00000000000725186047.log > > > > -rw-rw-r-- 1 ibiuser users 10485756 Oct 25 09:52 > > > > 00000000000727395414.timeindex > > > > -rw-rw-r-- 1 ibiuser users 10 Oct 25 09:52 > > > > 00000000000727395414.snapshot > > > > -rw-rw-r-- 1 ibiuser users 0 Oct 25 09:52 > > > > 00000000000725186047.timeindex > > > > -rw-rw-r-- 1 ibiuser users 961144 Oct 25 09:52 > > > > 00000000000725186047.index > > > > -rw-rw-r-- 1 ibiuser users 10485760 Oct 25 21:59 > > > > 00000000000727395414.index > > > > -rw-rw-r-- 1 ibiuser users 1028809456 Oct 25 21:59 > > > 00000000000727395414.log > > > > > > > > > > > > bash-4.2$ /kafka/latest/bin/kafka-run-class.sh > > > > kafka.tools.DumpLogSegments --files 00000000000727395414.timeindex > > > > > /tmp/timestamps 2>&1 > > > > > > > > The output looks like this: > > > > > > > > timestamp: 0 offset: 727395414 > > > > timestamp: 0 offset: 727395414 > > > > timestamp: 0 offset: 727395414 > > > > timestamp: 0 offset: 727395414 > > > > timestamp: 0 offset: 727395414 > > > > timestamp: 0 offset: 727395414 > > > > timestamp: 0 offset: 727395414 > > > > > > > > > > > > The offset is the same on all of the records and not a single line > has > > a > > > > timestamp greater than zero. > > > > > > > > bash-4.2$ cat /tmp/timestamps | grep timestamp | awk '{print $2}' | > > grep > > > > -v 0 > > > > bash-4.2$ > > > > > > > > For comparison, the .timeindex for the beats topic on the same > cluster > > > > looks like this: > > > > > > > > Dumping 00000000000254410745.timeindex > > > > timestamp: 1508978010544 offset: 254410759 > > > > timestamp: 1508978011084 offset: 254410763 > > > > timestamp: 1508978012080 offset: 254410770 > > > > timestamp: 1508978012789 offset: 254410796 > > > > timestamp: 1508978013981 offset: 254410808 > > > > timestamp: 1508978014526 offset: 254410812 > > > > timestamp: 1508978014698 offset: 254410828 > > > > timestamp: 1508978014959 offset: 254410834 > > > > timestamp: 1508978015677 offset: 254410854 > > > > timestamp: 1508978016980 offset: 254410870 > > > > timestamp: 1508978017212 offset: 254410883 > > > > > > > > > > > > On 26 October 2017 at 07:26, Elyahou Ittah <elyaho...@fiverr.com> > > wrote: > > > > > > > >> Which driver is used yo produce these messages ? > > > >> > > > >> On Oct 26, 2017 07:11, "Dan Markhasin" <minimi...@gmail.com> wrote: > > > >> > > > >> > No, that flag doesn't affect which offsets are returned, only > > executes > > > >> the > > > >> > action (and resets the consumer to latest offset when used, > > regardless > > > >> of > > > >> > datetime value I provide). > > > >> > > > > >> > On 25 October 2017 at 23:44, Hans Jespersen <h...@confluent.io> > > > wrote: > > > >> > > > > >> > > I think you are just missing the —execute flag. > > > >> > > > > > >> > > -hans > > > >> > > > > > >> > > > On Oct 25, 2017, at 1:24 PM, Ted Yu <yuzhih...@gmail.com> > > wrote: > > > >> > > > > > > >> > > > I wonder if you have hit KAFKA-5600. > > > >> > > > > > > >> > > > Is it possible that you try out 0.11.0.1 ? > > > >> > > > > > > >> > > > Thanks > > > >> > > > > > > >> > > >> On Wed, Oct 25, 2017 at 1:15 PM, Dan Markhasin < > > > >> minimi...@gmail.com> > > > >> > > wrote: > > > >> > > >> > > > >> > > >> I am using 0.11.0.0. > > > >> > > >> > > > >> > > >> There is no difference configuration-wise - both have 10 > > > partitions > > > >> > and > > > >> > > 2 > > > >> > > >> replicas. There are no errors in the logs, but looking in the > > > data > > > >> > > folder > > > >> > > >> it seems like Kafka is not updating the timeindex file for > > > >> data1_log - > > > >> > > >> notice how the timeindex file for the current log segment is > > not > > > >> being > > > >> > > >> updated. > > > >> > > >> > > > >> > > >> bash-4.2$ pwd > > > >> > > >> /kafka/data/data1_log-1 > > > >> > > >> bash-4.2$ ls -ltr | tail > > > >> > > >> -rw-rw-r-- 1 ibiuser it 1073731573 Oct 25 01:21 > > > >> > 00000000000337554984.log > > > >> > > >> -rw-rw-r-- 1 ibiuser it 943616 Oct 25 01:21 > > > >> > > 00000000000337554984.index > > > >> > > >> -rw-rw-r-- 1 ibiuser it 1073734199 Oct 25 13:38 > > > >> > 00000000000339816017.log > > > >> > > >> -rw-rw-r-- 1 ibiuser it 10485756 Oct 25 13:38 > > > >> > > >> 00000000000341934289.timeindex > > > >> > > >> -rw-rw-r-- 1 ibiuser it 10 Oct 25 13:38 > > > >> > > >> 00000000000341934289.snapshot > > > >> > > >> -rw-rw-r-- 1 ibiuser it 0 Oct 25 13:38 > > > >> > > >> 00000000000339816017.timeindex > > > >> > > >> -rw-rw-r-- 1 ibiuser it 566712 Oct 25 13:38 > > > >> > > 00000000000339816017.index > > > >> > > >> -rw-rw-r-- 1 ibiuser it 17 Oct 25 20:23 > > > >> > leader-epoch-checkpoint > > > >> > > >> -rw-rw-r-- 1 ibiuser it 10485760 Oct 25 23:03 > > > >> > > 00000000000341934289.index > > > >> > > >> -rw-rw-r-- 1 ibiuser it 461590419 Oct 25 23:04 > > > >> > 00000000000341934289.log > > > >> > > >> > > > >> > > >> For comparison, the beats topic: > > > >> > > >> > > > >> > > >> bash-4.2$ cd ../beats-1 > > > >> > > >> bash-4.2$ ls -ltr > > > >> > > >> total 3212088 > > > >> > > >> -rw-rw-r-- 1 ibiuser it 17 Oct 25 00:23 > > > >> > leader-epoch-checkpoint > > > >> > > >> -rw-rw-r-- 1 ibiuser it 10 Oct 25 20:04 > > > >> > > >> 00000000000188672034.snapshot > > > >> > > >> -rw-rw-r-- 1 ibiuser it 2773008 Oct 25 20:04 > > > >> > > >> 00000000000185224087.timeindex > > > >> > > >> -rw-rw-r-- 1 ibiuser it 1073741779 Oct 25 20:04 > > > >> > 00000000000185224087.log > > > >> > > >> -rw-rw-r-- 1 ibiuser it 1967440 Oct 25 20:04 > > > >> > > 00000000000185224087.index > > > >> > > >> -rw-rw-r-- 1 ibiuser it 10485760 Oct 25 23:03 > > > >> > > 00000000000188672034.index > > > >> > > >> -rw-rw-r-- 1 ibiuser it 10485756 Oct 25 23:04 > > > >> > > >> 00000000000188672034.timeindex > > > >> > > >> -rw-rw-r-- 1 ibiuser it 50166645 <5016%206645> Oct 25 23:04 > > > >> > 00000000000188672034.log > > > >> > > >> > > > >> > > >> > > > >> > > >> To give some context to why I'm even trying to reset the > > offsets, > > > >> we > > > >> > had > > > >> > > >> encountered a strange situation earlier today: > > > >> > > >> > > > >> > > >> 1) One of the brokers had a hardware failure, and had to be > > > rebuilt > > > >> > from > > > >> > > >> scratch (data partition was gone) > > > >> > > >> 2) When it went down, we noticed a spike in lag in one > > particular > > > >> > > consumer > > > >> > > >> group - it seems to have reset its offset to an earlier point > > in > > > >> time > > > >> > > (but > > > >> > > >> not the earliest offset of the topic); I have read other > > messages > > > >> on > > > >> > > this > > > >> > > >> mailing list of users who experienced the same behavior with > > > >> 0.11.0.0 > > > >> > > >> 3) The broker was reinstalled and rejoined the cluster with > the > > > >> same > > > >> > > >> broker.id (but with no data on it) - it rebalanced and > > > eventually > > > >> all > > > >> > > >> replicas became synced and the cluster was functioning > > normally. > > > >> > > >> 4) I then decided to bounce the same broker again to see if I > > can > > > >> > > reproduce > > > >> > > >> the issue I saw in #2 - and as soon as the broker was > > restarted, > > > >> the > > > >> > > exact > > > >> > > >> same consumer group had its offset reset again and was > lagging > > > with > > > >> > > >> millions of records behind the current offset. > > > >> > > >> 5) I then tried to manually reset the consumer group's offset > > to > > > a > > > >> few > > > >> > > >> minutes before I restarted the broker, only to discover this > > > >> strange > > > >> > > >> behavior where no matter which datetime value I provided, it > > kept > > > >> > > resetting > > > >> > > >> to the latest offset. > > > >> > > >> > > > >> > > >> > > > >> > > >>> On 25 October 2017 at 22:48, Ted Yu <yuzhih...@gmail.com> > > > wrote: > > > >> > > >>> > > > >> > > >>> Do you mind providing a bit more information ? > > > >> > > >>> > > > >> > > >>> Release of Kafka you use > > > >> > > >>> > > > >> > > >>> Any difference between data1_log and the other, normal > topic ? > > > >> > > >>> > > > >> > > >>> Probably check the broker log where data1_log is hosted - > see > > if > > > >> > there > > > >> > > is > > > >> > > >>> some clue. > > > >> > > >>> > > > >> > > >>> Thanks > > > >> > > >>> > > > >> > > >>> On Wed, Oct 25, 2017 at 12:11 PM, Dan Markhasin < > > > >> minimi...@gmail.com > > > >> > > > > > >> > > >>> wrote: > > > >> > > >>> > > > >> > > >>>> I'm trying to use the kafka-consumer-groups.sh tool in > order > > to > > > >> > rewind > > > >> > > >> a > > > >> > > >>>> consumer group's offset, however it seems to be returning > the > > > >> latest > > > >> > > >>> offset > > > >> > > >>>> regarding of the requested offset. > > > >> > > >>>> > > > >> > > >>>> You can see in the below example that two consecutive > > commands > > > to > > > >> > > reset > > > >> > > >>> the > > > >> > > >>>> offset to a specific point in time return different > > > (increasing) > > > >> > > >> offsets, > > > >> > > >>>> which are actually the latest offsets for the topic. > > > >> > > >>>> > > > >> > > >>>> - The consumer group ("test_consumer") is a console > consumer > > > that > > > >> > was > > > >> > > >>>> started with --from-beginning and terminated after a few > > > seconds, > > > >> > just > > > >> > > >>>> enough for it to commit its offsets. > > > >> > > >>>> - The topic data1_log is very busy with thousands of > incoming > > > >> > messages > > > >> > > >>> per > > > >> > > >>>> second > > > >> > > >>>> - The datetime value provided is approx. 5 hours earlier > than > > > the > > > >> > > >> current > > > >> > > >>>> UTC time > > > >> > > >>>> > > > >> > > >>>> [admin@broker01] ~> /kafka/latest/bin/kafka- > > consumer-groups.sh > > > >> > > >>>> --bootstrap-server localhost:9092 --reset-offsets --group > > > >> > > test_consumer > > > >> > > >>>> --topic data1_log --to-datetime '2017-10-25T13:40:00.000' > > > >> > > >>>> Note: This will only show information about consumers that > > use > > > >> the > > > >> > > Java > > > >> > > >>>> consumer API (non-ZooKeeper-based consumers). > > > >> > > >>>> > > > >> > > >>>> > > > >> > > >>>> TOPIC PARTITION NEW-OFFSET > > > >> > > >>>> data1_log 8 301485420 > > > >> > > >>>> data1_log 1 342788637 > > > >> > > >>>> data1_log 7 287621428 > > > >> > > >>>> data1_log 3 268612266 > > > >> > > >>>> data1_log 0 201860717 > > > >> > > >>>> data1_log 9 202749553 > > > >> > > >>>> data1_log 4 188974032 > > > >> > > >>>> data1_log 6 234308481 > > > >> > > >>>> data1_log 2 263507741 > > > >> > > >>>> data1_log 5 232707238 > > > >> > > >>>> > > > >> > > >>>> [admin@broker01] ~> /kafka/latest/bin/kafka- > > consumer-groups.sh > > > >> > > >>>> --bootstrap-server localhost:9092 --reset-offsets --group > > > >> > > test_consumer > > > >> > > >>>> --topic data1_log --to-datetime '2017-10-25T13:40:00.000' > > > >> > > >>>> Note: This will only show information about consumers that > > use > > > >> the > > > >> > > Java > > > >> > > >>>> consumer API (non-ZooKeeper-based consumers). > > > >> > > >>>> > > > >> > > >>>> > > > >> > > >>>> TOPIC PARTITION NEW-OFFSET > > > >> > > >>>> data1_log 8 301485491 > > > >> > > >>>> data1_log 1 342788779 > > > >> > > >>>> data1_log 7 287621534 > > > >> > > >>>> data1_log 3 268612364 > > > >> > > >>>> data1_log 0 201860796 > > > >> > > >>>> data1_log 9 202749620 > > > >> > > >>>> data1_log 4 188974068 > > > >> > > >>>> data1_log 6 234308564 > > > >> > > >>>> data1_log 2 263507823 > > > >> > > >>>> data1_log 5 232707293 > > > >> > > >>>> > > > >> > > >>>> This issue seems to be topic-specific - there is a > different > > > >> topic > > > >> > > >> (also > > > >> > > >>>> very active) where the same command consistently returns > the > > > >> correct > > > >> > > >>>> offsets fixed in the time for the requested datetime. > > > >> > > >>>> > > > >> > > >>>> What could be the issue here? > > > >> > > >>>> > > > >> > > >>>> Thanks, > > > >> > > >>>> Dan > > > >> > > >>>> > > > >> > > >>> > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > >