This is possible, but I think you don't need the time-based index for it :)

You will just buffer up all messages for a 5 minute sliding-window and
maintain all message sorted by timestamp in this window. Each time the
window "moves" you write the oldest records that "drop out" of the
window to the topic. If you get a record with an older timestamp that
allowed, you don't insert in into the window but drop it.

The timestamp index is useful if you want to seek to a specific offset
base on timestamp. But I don't think you need this for your use case.



-Matthias

On 11/21/17 1:39 PM, Ray Ruvinskiy wrote:
> I’ve been reading 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
>  and trying to determine whether I can use the time-based index as an 
> efficient way to sort a stream of messages into timestamp (CreateTime) order.
> 
> I am dealing with a number of sources emitting messages that are then 
> processed in a distributed fashion and written to a Kafka topic. During this 
> processing, the original order of the messages is not strictly maintained. 
> Each message has an embedded timestamp. I’d like to be able to sort these 
> messages back into timestamp order, allowing for a certain lateness interval, 
> before processing them further. For example, supposing the lateness interval 
> is 5 minutes, at time T I’d like to consume from the topic all messages with 
> timestamp up to (T - 5 minutes), in timestamp order. The assumption is that a 
> message should be no more than 5 minutes late; if it is more than 5 minutes 
> late, it can be discarded. Is this something that can be done with the 
> time-based index?
> 
> Thanks,
> 
> Ray
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to