[ 
https://issues.apache.org/jira/browse/KAFKA-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310980#comment-14310980
 ] 

Jay Kreps commented on KAFKA-1403:
----------------------------------

Ultimately in order to be accurate the time will actually need to be in the 
message itself. Currently we use the write time but this can be arbitrarily 
inaccurate: if you delete the data on a server and restart it it will rewrite 
everything with new timestamps.

> Adding timestamp to kafka index structure
> -----------------------------------------
>
>                 Key: KAFKA-1403
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1403
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.8.1
>            Reporter: Xinyao Hu
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Right now, kafka doesn't have timestamp per message. It makes an assumption 
> that all the messages in the same file has the same timestamp which is the 
> mtime of the file. This makes it inefficient to scan all the messages within 
> a time window, which is a valid use case in a lot of realtime data analysis. 
> One way to hack this is to roll a new file in a short period of time. 
> However, this will result in opening lots of files (KAFKA-1404) which crashed 
> the servers eventually. 
> My guess this is not implemented due to the efficiency reason. It will cost 
> additional four bytes per message which might be pinned in memory for fast 
> access. There might be some simple perf optimization, such as differential 
> encoding + var length encoding, which should bring down the cost to 1-2 bytes 
> avg per message. 
> Let me know if this makes sense. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to