Hi Eric,

Thanks for your comments. My goal is apply filter without any
serialization.

I will generate headers distinct values on Record Batch in producer. Broker
will build an index for header values like as timeindex. When consumer
apply filter broker will filter only record batch level. Filter will not
guarantee exact results. but it will reduce cost consumer side. Consumer
still needs to do whatever it does but for less amount of messages.

Do you see any issue ? In this model I think we dont have any penalty
except creating additional index file on broker and increase storage size
little bit.

Thanks

On Tue, Nov 30, 2021 at 10:21 AM Eric Azama <eazama...@gmail.com> wrote:

> Something to keep in mind with your proposal is that you're moving the
> Decompression and Filtering costs into the Brokers. It probably also adds a
> new Compression cost if you want the Broker to send compressed data over
> the network. Centralizing that cost on the cluster may not be desirable and
> would likely increase latency across the board.
>
> Additionally, because header values are byte arrays, the Brokers probably
> would not be able to do very sophisticated filtering. Support for basic
> comparisons of the built-in Serdes might be simple enough, but anything
> more complex or involving custom Serdes would probably require a new
> plug-in type on the broker.
>
> On Mon, Nov 29, 2021 at 10:49 AM Talat Uyarer <
> tuya...@paloaltonetworks.com>
> wrote:
>
> > Hi All,
> >
> > I want to get your advice about one subject. I want to create a KIP for
> > message header base filtering on Fetch API.
> >
> > Our current use case We have 1k+ topics and per topic, have 10+ consumers
> > for different use cases. However all consumers are interested in
> different
> > sets of messages on the same topic. Currently  We read all messages from
> a
> > given topic and drop logs on the consumer side. To reduce our stream
> > processing cost I want to drop logs on the broker side. So far my
> > understanding
> >
> > *Broker send messages as is (No serilization cost) -> Network Transfer ->
> > > Consumer Deserialize Messages(User side deserilization cost) -> User
> > Space
> > > drop or use messages (User Sidefiltering cost)*
> >
> >
> > If I can drop messages based on their headers without serialization and
> > deserialization messages. It will help us save network bandwidth and as
> > well as consumer side cpu cost.
> >
> > My approach is building a header index. Consumer clients will define
> > their filter in the fetch call. If the filter is matching, the broker
> will
> > send the messages. I would like to hear your suggestions about my
> solution.
> >
> > Thanks
> >
>

Reply via email to