Hi,
I am discovering the advanced features recently added to Kafka, like the 
timestamping and the headers .
In the use case I am investigating, time series oriented, timestamp is 
definitely something I will investigate.
However, when I investigated the timestamp, I discovered the header feature 
that sparked new idea, but I don't know if currently feasible:

Imagine that I am storing in a topic, messages with key representing SerieId. 
Meaning a single topic contains many time series, each uniquely identified by a 
serieId.
Now, the problem is that data for a given partition will contain intermingled 
data from multiple series.
But when I consume data, I want only data from a single or small list of 
serieId.

Since messages are compacted by set of whatever fit in a send buffer, I would 
have hopped that at that level of set of messages, I could store a header (so 
not at message level, but at "set of message compacted" as one send buffer), 
that contain a bloomfilter of the serieId contained in the compressed payload.
Then on consumer side, I wish I can conditionally decompress that set, if the 
bloom filter on serieId tells me there is probably a match in the compressed 
data, and totally filter the packet out if not, without decompressing it.
This idea can be generalized with other type statistics, as long as we provide 
a generic "header aggregate" api, that can compute message set level header 
based on embedded messages?
Is that something already feasible with the api set?
Thanks in advance for the help
Eric Owhadi
Esgyn Corporation

Reply via email to