Any idea on this usecase guys ?

Thanks,

-----Original Message-----
From: Prunier, Dominique [mailto:dominique.prun...@emc.com] 
Sent: Friday, August 15, 2014 11:02 AM
To: users@kafka.apache.org
Subject: RE: Consumer sensitive expiration of topic

Hi,

Thanks for the answer. 

The topics themselves won't be shortlived (as their consumers are supposed to 
stay there), the messages in them will. What i'm trying to achieve is something 
similar to this:

Producers --<topic>--> Processor A0 --<topic_a_1>--> Processor A1 
--<topic_a_2>--> ... --<topic_a_N>--> Consumer
                  |--> Processor B0 --<topic_b_1>--> Processor B1 
--<topic_a_2>--> ... --<topic_b_N>--> Consumer
                  |--> Processor C0 --<topic_c_1>--> Processor C1 
--<topic_a_2>--> ... --<topic_c_N>--> Consumer

Essentially, the "main" topic is the first one and only one consumed by 
multiple processors/consumers. These processors know what is the next processor 
they should send their data to by knowing their "private" topic name. So in 
this example, once Processor A1 picks a message in topic_a_1 and commits the 
offset, the message won't be used anymore by anyone else.

There is no particular issue just leaving this as is, but topic_a_1 is going to 
buffer quite a lot of stuff on disk while essentially, the only thing that we 
have to deal with here is Processor A1 going down or lagging. When Processor A1 
is healthy, the expiration of topic_a_1 could be kept very low and avoid a fair 
amount of resource use.

An idea on the top of my head would be an API where you can manually set the 
expiration of a topic by specifying offsets for partitions. This way, once 
Processor A1 has consumed its messages, it could not only commit the offsets 
(which, as far as i understand, has nothing to do with the broker itself) but 
also set the expiration of the topic using the same offsets (which could be 
done less frequently).

Does it make sense ?

Thanks,

-----Original Message-----
From: Neha Narkhede [mailto:neha.narkh...@gmail.com] 
Sent: Thursday, August 14, 2014 8:10 PM
To: users@kafka.apache.org
Subject: Re: Consumer sensitive expiration of topic

By design, Kafka stores data independent of the number of publishers or
subscribers connecting to it. This provides high performance as the broker
does not have to manage consumers and evict data based on the consumers
position. This is one of the main reasons why Kafka is much more
performance compared to the JMS queues.

It seems like your use case requires the concept of ephemeral topics where
you would like to auto delete a topic once a particular consumer group has
finished consuming data from it. Once 0.8.2 is released with the delete
topic support, we intend to add auto expiration of topics that will delete
topics that have not been accessed in some configurable time.

Is there a reason why your application needs to create such short lived
topics?

Thanks,
Neha


On Thu, Aug 14, 2014 at 2:56 PM, Prunier, Dominique <
dominique.prun...@emc.com> wrote:

> Hi,
>
> I'm playing around with Kafka with the idea to implement a general purpose
> message exchanger for a distributed application with high throughput
> requirements (multiple hundred thousand messages per sec).
>
> In this context, i would like to be able to use a topic as some form of
> private mailbox for a single consumer group. In this situation, once the
> single consumer group has committed its offset on its private topic, the
> messages there won't be used anymore and can be safely discarded.
> Therefore, i was wondering if you'd see a way (in the current release or in
> the future) to have a topic which expiration policy is based on consumer
> offsets.
>
> Thanks,
>
> --
> Dominique Prunier
>
>

Reply via email to