Yeah, i'm really not worried about performance. Disk space, or more specifically, disk space by duplication of the same data in different topics was my concern. The primary use case would be a special consumer which job would be to partition the messages from a topic into various "private consumer topics" (without altering it) to provide a filtered subscription service (e.g. for a remote service on slower network which cannot afford to receive the whole bunch of data and only wants a subset of it).
Do you think it would make sense to have a remote API call that manually expire some partition segments by offset (as opposed to time and/or size) ? For example, exposing cleanupLogs with additional parameters to cleanup segments on demand ? I think it would be more than enough for me and could be used for various other things, like manually truncating a topic which data isn't relevant anymore without recreating it ? Thanks, -----Original Message----- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Wednesday, August 27, 2014 11:36 PM To: users@kafka.apache.org Subject: Re: Consumer sensitive expiration of topic Kafka is designed to maintain persistent backlog of data on disk efficiently and at scale. Unlike other messaging systems, doing so does not affect the performance of the system. If you are worried about the messages occupying disk space, you can always set a lower retention on the topic that is higher than any lag your consumer can accrue. The best plan here would be to plan for allocating disk space for the retention. On Mon, Aug 25, 2014 at 2:25 PM, Prunier, Dominique < dominique.prun...@emc.com> wrote: > Any idea on this usecase guys ? > > Thanks, > > -----Original Message----- > From: Prunier, Dominique [mailto:dominique.prun...@emc.com] > Sent: Friday, August 15, 2014 11:02 AM > To: users@kafka.apache.org > Subject: RE: Consumer sensitive expiration of topic > > Hi, > > Thanks for the answer. > > The topics themselves won't be shortlived (as their consumers are supposed > to stay there), the messages in them will. What i'm trying to achieve is > something similar to this: > > Producers --<topic>--> Processor A0 --<topic_a_1>--> Processor A1 > --<topic_a_2>--> ... --<topic_a_N>--> Consumer > |--> Processor B0 --<topic_b_1>--> Processor B1 > --<topic_a_2>--> ... --<topic_b_N>--> Consumer > |--> Processor C0 --<topic_c_1>--> Processor C1 > --<topic_a_2>--> ... --<topic_c_N>--> Consumer > > Essentially, the "main" topic is the first one and only one consumed by > multiple processors/consumers. These processors know what is the next > processor they should send their data to by knowing their "private" topic > name. So in this example, once Processor A1 picks a message in topic_a_1 > and commits the offset, the message won't be used anymore by anyone else. > > There is no particular issue just leaving this as is, but topic_a_1 is > going to buffer quite a lot of stuff on disk while essentially, the only > thing that we have to deal with here is Processor A1 going down or lagging. > When Processor A1 is healthy, the expiration of topic_a_1 could be kept > very low and avoid a fair amount of resource use. > > An idea on the top of my head would be an API where you can manually set > the expiration of a topic by specifying offsets for partitions. This way, > once Processor A1 has consumed its messages, it could not only commit the > offsets (which, as far as i understand, has nothing to do with the broker > itself) but also set the expiration of the topic using the same offsets > (which could be done less frequently). > > Does it make sense ? > > Thanks, > > -----Original Message----- > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > Sent: Thursday, August 14, 2014 8:10 PM > To: users@kafka.apache.org > Subject: Re: Consumer sensitive expiration of topic > > By design, Kafka stores data independent of the number of publishers or > subscribers connecting to it. This provides high performance as the broker > does not have to manage consumers and evict data based on the consumers > position. This is one of the main reasons why Kafka is much more > performance compared to the JMS queues. > > It seems like your use case requires the concept of ephemeral topics where > you would like to auto delete a topic once a particular consumer group has > finished consuming data from it. Once 0.8.2 is released with the delete > topic support, we intend to add auto expiration of topics that will delete > topics that have not been accessed in some configurable time. > > Is there a reason why your application needs to create such short lived > topics? > > Thanks, > Neha > > > On Thu, Aug 14, 2014 at 2:56 PM, Prunier, Dominique < > dominique.prun...@emc.com> wrote: > > > Hi, > > > > I'm playing around with Kafka with the idea to implement a general > purpose > > message exchanger for a distributed application with high throughput > > requirements (multiple hundred thousand messages per sec). > > > > In this context, i would like to be able to use a topic as some form of > > private mailbox for a single consumer group. In this situation, once the > > single consumer group has committed its offset on its private topic, the > > messages there won't be used anymore and can be safely discarded. > > Therefore, i was wondering if you'd see a way (in the current release or > in > > the future) to have a topic which expiration policy is based on consumer > > offsets. > > > > Thanks, > > > > -- > > Dominique Prunier > > > > >