Hi, I am discovering the advanced features recently added to Kafka, like the timestamping and the headers . In the use case I am investigating, time series oriented, timestamp is definitely something I will investigate. However, when I investigated the timestamp, I discovered the header feature that sparked new idea, but I don't know if currently feasible:
Imagine that I am storing in a topic, messages with key representing SerieId. Meaning a single topic contains many time series, each uniquely identified by a serieId. Now, the problem is that data for a given partition will contain intermingled data from multiple series. But when I consume data, I want only data from a single or small list of serieId. Since messages are compacted by set of whatever fit in a send buffer, I would have hopped that at that level of set of messages, I could store a header (so not at message level, but at "set of message compacted" as one send buffer), that contain a bloomfilter of the serieId contained in the compressed payload. Then on consumer side, I wish I can conditionally decompress that set, if the bloom filter on serieId tells me there is probably a match in the compressed data, and totally filter the packet out if not, without decompressing it. This idea can be generalized with other type statistics, as long as we provide a generic "header aggregate" api, that can compute message set level header based on embedded messages? Is that something already feasible with the api set? Thanks in advance for the help Eric Owhadi Esgyn Corporation