I wonder if you have hit KAFKA-5600.

Is it possible that you try out 0.11.0.1 ?

Thanks

On Wed, Oct 25, 2017 at 1:15 PM, Dan Markhasin <minimi...@gmail.com> wrote:

> I am using 0.11.0.0.
>
> There is no difference configuration-wise - both have 10 partitions and 2
> replicas. There are no errors in the logs, but looking in the data folder
> it seems like Kafka is not updating the timeindex file for data1_log -
> notice how the timeindex file for the current log segment is not being
> updated.
>
> bash-4.2$ pwd
> /kafka/data/data1_log-1
> bash-4.2$ ls -ltr | tail
> -rw-rw-r-- 1 ibiuser it 1073731573 Oct 25 01:21 00000000000337554984.log
> -rw-rw-r-- 1 ibiuser it     943616 Oct 25 01:21 00000000000337554984.index
> -rw-rw-r-- 1 ibiuser it 1073734199 Oct 25 13:38 00000000000339816017.log
> -rw-rw-r-- 1 ibiuser it   10485756 Oct 25 13:38
> 00000000000341934289.timeindex
> -rw-rw-r-- 1 ibiuser it         10 Oct 25 13:38
> 00000000000341934289.snapshot
> -rw-rw-r-- 1 ibiuser it          0 Oct 25 13:38
> 00000000000339816017.timeindex
> -rw-rw-r-- 1 ibiuser it     566712 Oct 25 13:38 00000000000339816017.index
> -rw-rw-r-- 1 ibiuser it         17 Oct 25 20:23 leader-epoch-checkpoint
> -rw-rw-r-- 1 ibiuser it   10485760 Oct 25 23:03 00000000000341934289.index
> -rw-rw-r-- 1 ibiuser it  461590419 Oct 25 23:04 00000000000341934289.log
>
> For comparison, the beats topic:
>
> bash-4.2$ cd ../beats-1
> bash-4.2$ ls -ltr
> total 3212088
> -rw-rw-r-- 1 ibiuser it         17 Oct 25 00:23 leader-epoch-checkpoint
> -rw-rw-r-- 1 ibiuser it         10 Oct 25 20:04
> 00000000000188672034.snapshot
> -rw-rw-r-- 1 ibiuser it    2773008 Oct 25 20:04
> 00000000000185224087.timeindex
> -rw-rw-r-- 1 ibiuser it 1073741779 Oct 25 20:04 00000000000185224087.log
> -rw-rw-r-- 1 ibiuser it    1967440 Oct 25 20:04 00000000000185224087.index
> -rw-rw-r-- 1 ibiuser it   10485760 Oct 25 23:03 00000000000188672034.index
> -rw-rw-r-- 1 ibiuser it   10485756 Oct 25 23:04
> 00000000000188672034.timeindex
> -rw-rw-r-- 1 ibiuser it   50166645 Oct 25 23:04 00000000000188672034.log
>
>
> To give some context to why I'm even trying to reset the offsets, we had
> encountered a strange situation earlier today:
>
> 1) One of the brokers had a hardware failure, and had to be rebuilt from
> scratch (data partition was gone)
> 2) When it went down, we noticed a spike in lag in one particular consumer
> group - it seems to have reset its offset to an earlier point in time (but
> not the earliest offset of the topic); I have read other messages on this
> mailing list of users who experienced the same behavior with 0.11.0.0
> 3) The broker was reinstalled and rejoined the cluster with the same
> broker.id (but with no data on it) - it rebalanced and eventually all
> replicas became synced and the cluster was functioning normally.
> 4) I then decided to bounce the same broker again to see if I can reproduce
> the issue I saw in #2 - and as soon as the broker was restarted, the exact
> same consumer group had its offset reset again and was lagging with
> millions of records behind the current offset.
> 5) I then tried to manually reset the consumer group's offset to a few
> minutes before I restarted the broker, only to discover this strange
> behavior where no matter which datetime value I provided, it kept resetting
> to the latest offset.
>
>
> On 25 October 2017 at 22:48, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Do you mind providing a bit more information ?
> >
> > Release of Kafka you use
> >
> > Any difference between data1_log and the other, normal topic ?
> >
> > Probably check the broker log where data1_log is hosted - see if there is
> > some clue.
> >
> > Thanks
> >
> > On Wed, Oct 25, 2017 at 12:11 PM, Dan Markhasin <minimi...@gmail.com>
> > wrote:
> >
> > > I'm trying to use the kafka-consumer-groups.sh tool in order to rewind
> a
> > > consumer group's offset, however it seems to be returning the latest
> > offset
> > > regarding of the requested offset.
> > >
> > > You can see in the below example that two consecutive commands to reset
> > the
> > > offset to a specific point in time return different (increasing)
> offsets,
> > > which are actually the latest offsets for the topic.
> > >
> > > - The consumer group ("test_consumer") is a console consumer that was
> > > started with --from-beginning and terminated after a few seconds, just
> > > enough for it to commit its offsets.
> > > - The topic data1_log is very busy with thousands of incoming messages
> > per
> > > second
> > > - The datetime value provided is approx. 5 hours earlier than the
> current
> > > UTC time
> > >
> > > [admin@broker01] ~> /kafka/latest/bin/kafka-consumer-groups.sh
> > > --bootstrap-server localhost:9092 --reset-offsets --group test_consumer
> > > --topic data1_log --to-datetime '2017-10-25T13:40:00.000'
> > > Note: This will only show information about consumers that use the Java
> > > consumer API (non-ZooKeeper-based consumers).
> > >
> > >
> > > TOPIC                          PARTITION  NEW-OFFSET
> > > data1_log                      8          301485420
> > > data1_log                      1          342788637
> > > data1_log                      7          287621428
> > > data1_log                      3          268612266
> > > data1_log                      0          201860717
> > > data1_log                      9          202749553
> > > data1_log                      4          188974032
> > > data1_log                      6          234308481
> > > data1_log                      2          263507741
> > > data1_log                      5          232707238
> > >
> > > [admin@broker01] ~> /kafka/latest/bin/kafka-consumer-groups.sh
> > > --bootstrap-server localhost:9092 --reset-offsets --group test_consumer
> > > --topic data1_log --to-datetime '2017-10-25T13:40:00.000'
> > > Note: This will only show information about consumers that use the Java
> > > consumer API (non-ZooKeeper-based consumers).
> > >
> > >
> > > TOPIC                          PARTITION  NEW-OFFSET
> > > data1_log                      8          301485491
> > > data1_log                      1          342788779
> > > data1_log                      7          287621534
> > > data1_log                      3          268612364
> > > data1_log                      0          201860796
> > > data1_log                      9          202749620
> > > data1_log                      4          188974068
> > > data1_log                      6          234308564
> > > data1_log                      2          263507823
> > > data1_log                      5          232707293
> > >
> > > This issue seems to be topic-specific - there is a different topic
> (also
> > > very active) where the same command consistently returns the correct
> > > offsets fixed in the time for the requested datetime.
> > >
> > > What could be the issue here?
> > >
> > > Thanks,
> > > Dan
> > >
> >
>

Reply via email to