[ 
https://issues.apache.org/jira/browse/KAFKA-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657795#comment-15657795
 ] 

Jiangjie Qin commented on KAFKA-4398:
-------------------------------------

[~huxi_2b] I am not sure I understand the issue here. So in the log the message 
is the following:
(offset=1, timestmap=T1)
(offset=2, timestamp=T3)
(offset=3, timestamp=T2.5)

In the offset index, the index entries would be:
(T1 -> 1)
(T3 -> 2)

In this case, if the consumer search for timestamp T2.5, offset 2 (i.e. T3) is 
expected to be returned because it returns the first offset of the message 
whose timestamp is greater than or equals to T2.5. This is to guarantee all the 
messages after the target timestamp would be consumed. If we return offset 3 in 
this case, message 2 whose timestamp is T3 (which is greater than T2.5) will 
not be consumed, right?

> offsetsForTimes returns false starting offset when timestamp of messages are 
> not monotonically increasing
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4398
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4398
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, core
>    Affects Versions: 0.10.1.0
>            Reporter: huxi
>            Assignee: huxi
>
> After a code walk-through for KIP-33(Add a time based log index), I found a 
> use case where method 'offsetsForTimes' fails to return the correct offset if 
> a series of messages are created without the monotonically increasing 
> timestamps (CreateTime is used)
> Say T0 is the hour when the first message is created. Tn means the (T+n)th 
> hour. Then, I created another two messages at T1 and T3 respectively. At this 
> moment, the <baseoffset>.timeindex should contain two items:
> T1 --->  1
> T3 ----> 2  (whether it contains T0 does not matter to this problem)
> Later, due to some reason, I want to insert a third message in between T1 and 
> T3, say T2.5, but the time index file got no changed because of the limit 
> that timestamp should be monotonically increasing for each segment.
> After generating message with T2.5, I invoke 
> KafkaConsumer.offsetsForTimes("tp" -> T2.5), hoping to get the first offset 
> with timestamp greater or equal to T2.5 which should be the third message in 
> this case, but consumer returns the second message with T3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to