[ https://issues.apache.org/jira/browse/KAFKA-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310980#comment-14310980 ]
Jay Kreps commented on KAFKA-1403: ---------------------------------- Ultimately in order to be accurate the time will actually need to be in the message itself. Currently we use the write time but this can be arbitrarily inaccurate: if you delete the data on a server and restart it it will rewrite everything with new timestamps. > Adding timestamp to kafka index structure > ----------------------------------------- > > Key: KAFKA-1403 > URL: https://issues.apache.org/jira/browse/KAFKA-1403 > Project: Kafka > Issue Type: Improvement > Components: core > Affects Versions: 0.8.1 > Reporter: Xinyao Hu > Original Estimate: 336h > Remaining Estimate: 336h > > Right now, kafka doesn't have timestamp per message. It makes an assumption > that all the messages in the same file has the same timestamp which is the > mtime of the file. This makes it inefficient to scan all the messages within > a time window, which is a valid use case in a lot of realtime data analysis. > One way to hack this is to roll a new file in a short period of time. > However, this will result in opening lots of files (KAFKA-1404) which crashed > the servers eventually. > My guess this is not implemented due to the efficiency reason. It will cost > additional four bytes per message which might be pinned in memory for fast > access. There might be some simple perf optimization, such as differential > encoding + var length encoding, which should bring down the cost to 1-2 bytes > avg per message. > Let me know if this makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)