Forever is a long time. The definition of replay and navigating through different versions of kafka would be key.
Example: If you are storing market data into kafka and have a cep engine running on top and would like replay "transactions" to be fed back to ensure replayability, then you would probably want to manage that through the same mechanism as it existed at that time in the past. This might mean a different kafka broker (perhaps 0.7) with a different set of consumers with a potentially different JVM. This, of course, gets into a rat hole. Regards Milind On Thu, Feb 21, 2013 at 4:29 PM, Eric Tschetter <ched...@metamarkets.com>wrote: > Anthony, > > Is there a reason you wouldn't want to just push the data into something > built for cheap, long-term storage (like glacier, S3, or HDFS) and perhaps > "replay" from that instead of from the kafka brokers? I can't speak for > Jay, Jun or Neha, but I believe the expected usage of Kafka is essentially > as a buffering mechanism to take the edge off the natural ebb-n-flow of > unpredictable internet traffic. The highly available, long-term storage of > data is probably not at the top of their list of use cases when making > design decisions. > > --Eric > > > On Thu, Feb 21, 2013 at 6:00 PM, Anthony Grimes <i...@raynes.me> wrote: > > > Our use case is that we'd like to log data we don't need away and > > potentially replay it at some point. We don't want to delete old logs. I > > googled around a bit and I only discovered this particular post: > > http://mail-archives.apache.**org/mod_mbox/incubator-kafka-** > > users/201210.mbox/%3CCAFbh0Q2=**eJcDT6NvTAPtxhXSk64x0Yms-G-** > > AOqOoy=FtVVM6SQ@mail.gmail.**com%3E< > http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201210.mbox/%3CCAFbh0Q2=eJcDT6NvTAPtxhXSk64x0Yms-G-AOqOoy=ftvvm...@mail.gmail.com%3E > > > > > > In summary, it appears the primary issue is that Kafka keeps file handles > > of each log segment open. Is there a way to configure this, or is a way > to > > do so planned? It appears that an option to deduplicate instead of delete > > was added recently, so doesn't the file handle issue exist with that as > > well (since files aren't being deleted)? > > >